生产背景:
在从mysql导入到hive中,遇到如下问题:
从hive导出到mysql中,遇到如下问题:
sqoop 缺点:
1 基于命令行的操作方式,易出错,且不安全。
2 数据传输和数据格式是紧耦合的,这使得connector无法支持所有的数据格式
3 用户名和密码暴漏出来
4 sqoop安装需要root权限
Sqoop优点:
1 高效可控的利用资源,任务并行度,超时时间。
2 数据类型映射与转化,可自动进行,用户也可自定义 .
3 支持多种主流数据库,MySQL,Oracle,SQL Server,DB2等等 。
Sqoop原理:
Sqoop的inport原理:
Sqoop的export原理:
验证sqoop的各种报错:
1 mysql字段太短
2 hive的空字段转换
3 分隔符错误
4 mysql的网络不在集群网络中
5 mysql停止服务
6 mysql utf8编码只是3个字节,可能是因为某些unicode字符转成utf8之后变成了4个字节,需要mysql支持utf8mb4
7 Sqoop调式信息
8 修改生成的Java类,重新打包。
Sqoop命令行说明
生产背景:
在从mysql导入到hive中,遇到如下问题:
1) 源mysql和集群机器不在同一个网段中,导致执行导入命令,网络连接失败。
2) 某些字符导入到hive中,出现报错终止。
2.1 sqoop使用的JDBC-connector 版本太低(更换版本)。
从hive导出到mysql中,遇到如下问题:
1)某些字符插入mysql,出现报错终止。
1.1 可能mysql本身编码的限制,某些字符不支持,比如uft8和utf8mb4
1.2 sqoop使用的JDBC-connector 版本太低(更换版本)。
sqoop 缺点:
1 基于命令行的操作方式,易出错,且不安全。
2 数据传输和数据格式是紧耦合的,这使得connector无法支持所有的数据格式
3 用户名和密码暴漏出来
4 sqoop安装需要root权限
Sqoop优点:
1 高效可控的利用资源,任务并行度,超时时间。
2 数据类型映射与转化,可自动进行,用户也可自定义 .
3 支持多种主流数据库,MySQL,Oracle,SQL Server,DB2等等 。
Sqoop原理:
Sqoop的inport原理:
Sqoop在import时,需要制定split-by参数。Sqoop根据不同的split-by参数值来进行切分,然后将切分出来的区域分配到不同map中。每个map中再处理数据库中获取的一行一行的值,写入到HDFS中。同时split-by根据不同的参数类型有不同的切分方法,如比较简单的int型,Sqoop会取最大和最小split-by字段值,然后根据传入的num-mappers来确定划分几个区域。 比如select max(split_by),min(split-by) from得到的max(split-by)和min(split-by)分别为1000和1,而num-mappers为2的话,则会分成两个区域(1,500)和(501-100),同时也会分成2个sql给2个map去进行导入操作,分别为select XXX from table where split-by>=1 and split-by<500和select XXX from table where split-by>=501 and split-by<=1000。最后每个map各自获取各自SQL中的数据进行导入工作。
Sqoop的export原理:根据mysql表名称,生成一个以表名称命名的Java类,该类继承了sqoopRecord的,是一个只有Map的MR,且自定义了输出字段。
sqoop export --connect jdbc:mysql://$url:3306/$3?characterEncoding=utf8 --username $username --password $password --table $1 --export-dir $2 --input-fields-terminated-by '|' --null-non-string '0' --null-string '0';
验证sqoop的各种报错:
mysql表
create table dm_trlog (
plat varchar(20),
user_id varchar(20),
click_time varchar(20),
click_url varchar(200)
)
hive 表
CREATE TABLE TRLOG
(
PLATFORM string,
USER_ID int,
CLICK_TIME string,
CLICK_URL string
)
row format delimited
fields terminated by '\t';
sqoop list-databases –connect jdbc:mysql://192.168.119.129:3306/ –username li72 –password 123
1 mysql字段太短
sqoop export --connect jdbc:mysql://192.168.119.129:3306/student?characterEncoding=utf8 --username li72 --password 123 --table dm_trlog --export-dir /home/bigdata/hive/data/db1.db/trlog --input-fields-terminated-by '\t' --null-non-string '0' --null-string '0';
Warning: $HADOOP_HOME is deprecated.
14/11/06 01:42:32 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
14/11/06 01:42:32 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
14/11/06 01:42:32 INFO tool.CodeGenTool: Beginning code generation
14/11/06 01:42:34 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `dm_trlog` AS t LIMIT 1
14/11/06 01:42:34 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `dm_trlog` AS t LIMIT 1
14/11/06 01:42:34 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/bigdata/hadoop
Note: /tmp/sqoop-root/compile/d5e37c20a9231b3253c97fc27d16d8a9/dm_trlog.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
14/11/06 01:42:38 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/d5e37c20a9231b3253c97fc27d16d8a9/dm_trlog.jar
14/11/06 01:42:38 INFO mapreduce.ExportJobBase: Beginning export of dm_trlog
14/11/06 01:42:43 INFO input.FileInputFormat: Total input paths to process : 1
14/11/06 01:42:43 INFO input.FileInputFormat: Total input paths to process : 1
14/11/06 01:42:43 INFO util.NativeCodeLoader: Loaded the native-hadoop library
14/11/06 01:42:43 WARN snappy.LoadSnappy: Snappy native library not loaded
14/11/06 01:42:44 INFO mapred.JobClient: Running job: job_201411060114_0001
14/11/06 01:42:45 INFO mapred.JobClient: map 0% reduce 0%
14/11/06 01:43:15 INFO mapred.JobClient: Task Id : attempt_201411060114_0001_m_000000_0, Status : FAILED
java.io.IOException: com.mysql.jdbc.MysqlDataTruncation: Data truncation: Data too long for column 'click_time' at row 1
at org.apache.sqoop.mapreduce.AsyncSqlRecordWriter.close(AsyncSqlRecordWriter.java:192)
at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.close(MapTask.java:651)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: com.mysql.jdbc.MysqlDataTruncation: Data truncation: Data too long for column 'click_time' at row 1
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:4118)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:4052)
at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2503)
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2664)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2794)
at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:2155)
at com.mysql.jdbc.PreparedStatement.execute(PreparedStatement.java:1379)
at org.apache.sqoop.mapreduce.AsyncSqlOutputFormat$AsyncSqlExecThread.run(AsyncSqlOutputFormat.java:233)
mysql字段太短了
drop table dm_trlog;
create table dm_trlog (
plat varchar(20),
user_id varchar(20),
click_time varchar(200),
click_url varchar(200)
)
2 hive的空字段转换
由于Hive的NULL用\N来表示,字段用\001来分割,换行用\n来换行,导出分隔符一定要和hive表保持一致,如果为空可以指定转换为0,有些mysql数字字段不能插入\N
加上两个参数:--input-null-string '\\N' --input-null-non-string '\\N',多加一个'\',是为转义
sqoop export --connect jdbc:mysql://192.168.119.129:3306/student?characterEncoding=utf8 --username li72 --password 123 --table dm_trlog --export-dir /home/bigdata/hive/data/db1.db/trlog --input-fields-terminated-by '\t' --null-non-string '0' --null-string '0';
14/10/23 04:53:47 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/915d24128af59c9e517580e0f07411d4/dm_pc_play_kpi.jar
14/10/23 04:53:47 INFO mapreduce.ExportJobBase: Beginning export of dm_pc_play_kpi 14/10/23 04:53:48 INFO input.FileInputFormat: Total input paths to process : 1 14/10/23 04:53:48 INFO input.FileInputFormat: Total input paths to process : 1 14/10/23 04:53:48 WARN snappy.LoadSnappy: Snappy native library is available 14/10/23 04:53:48 INFO util.NativeCodeLoader: Loaded the native-hadoop library 14/10/23 04:53:48 INFO snappy.LoadSnappy: Snappy native library loaded 14/10/23 04:53:48 INFO mapred.JobClient: Running job: job_201408301703_84117 14/10/23 04:53:49 INFO mapred.JobClient: map 0% reduce 0% 14/10/23 04:55:45 INFO mapred.JobClient: Task Id : attempt_201408301703_84117_m_000000_0, Status : FAILED java.io.IOException: Can't export data, please check task tracker logs at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112) at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149) at org.apache.hadoop.mapred.Child.main(Child.java:249) Caused by: java.lang.NumberFormatException: For input string: "N" at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1241) at java.lang.Float.valueOf(Float.java:417) at dm_pc_play_kpi.__loadFromFields(dm_pc_play_kpi.java:335) at dm_pc_play_kpi.parse(dm_pc_play_kpi.java:282) at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:83) ... 10 more 14/10/23 04:55:53 INFO mapred.JobClient: Task Id : attempt_201408301703_84117_m_000000_1, Status : FAILED 14/10/23 04:55:58 INFO mapred.JobClient: Task Id : attempt_201408301703_84117_m_000000_2, Status : FAILED java.io.IOException: Can't export data, please check task tracker logs at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112) at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149) at org.apache.hadoop.mapred.Child.main(Child.java:249) Caused by: java.lang.NumberFormatException: For input string: "N" at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1241) at java.lang.Float.valueOf(Float.java:417) at dm_pc_play_kpi.__loadFromFields(dm_pc_play_kpi.java:335) at dm_pc_play_kpi.parse(dm_pc_play_kpi.java:282) at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:83) ... 10 more
3 分隔符错误
Hive中的分隔符是’\t’ 但是导出写成’|’
sqoop export --connect jdbc:mysql://192.168.119.129:3306/student?characterEncoding=utf8 --username li72 --password 123 --table dm_trlog --export-dir /home/bigdata/hive/data/db1.db/trlog --input-fields-terminated-by '|' --null-non-string '0' --null-string '0';
为了测试,特意把分隔符改成 "|"
14/11/06 01:50:19 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
14/11/06 01:50:19 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
14/11/06 01:50:19 INFO tool.CodeGenTool: Beginning code generation
14/11/06 01:50:20 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `dm_trlog` AS t LIMIT 1
14/11/06 01:50:21 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `dm_trlog` AS t LIMIT 1
14/11/06 01:50:21 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/bigdata/hadoop
Note: /tmp/sqoop-root/compile/e474b3f8292f91dd4134b302ae35df19/dm_trlog.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
14/11/06 01:50:25 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/e474b3f8292f91dd4134b302ae35df19/dm_trlog.jar
14/11/06 01:50:25 INFO mapreduce.ExportJobBase: Beginning export of dm_trlog
14/11/06 01:50:45 INFO input.FileInputFormat: Total input paths to process : 1
14/11/06 01:50:45 INFO input.FileInputFormat: Total input paths to process : 1
14/11/06 01:50:45 INFO util.NativeCodeLoader: Loaded the native-hadoop library
14/11/06 01:50:45 WARN snappy.LoadSnappy: Snappy native library not loaded
14/11/06 01:50:51 INFO mapred.JobClient: Running job: job_201411060114_0003
14/11/06 01:50:52 INFO mapred.JobClient: map 0% reduce 0%
14/11/06 01:51:20 INFO mapred.JobClient: Task Id : attempt_201411060114_0003_m_000000_0, Status : FAILED
java.io.IOException: Can't export data, please check task tracker logs
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.util.NoSuchElementException
at java.util.AbstractList$Itr.next(AbstractList.java:350)
at dm_trlog.__loadFromFields(dm_trlog.java:252)
at dm_trlog.parse(dm_trlog.java:201)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:83)
... 10 more
14/11/06 01:51:21 INFO mapred.JobClient: Task Id : attempt_201411060114_0003_m_000001_0, Status : FAILED
java.io.IOException: Can't export data, please check task tracker logs
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.util.NoSuchElementException
at java.util.AbstractList$Itr.next(AbstractList.java:350)
at dm_trlog.__loadFromFields(dm_trlog.java:252)
at dm_trlog.parse(dm_trlog.java:201)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:83)
... 10 more
4 mysql的网络不在集群网络中
sqoop export --connect jdbc:mysql://192.168.119.1:3306/student?characterEncoding=utf8 --username li72 --password 123 --table dm_trlog --export-dir /home/bigdata/hive/data/db1.db/trlog --input-fields-terminated-by '\t' --null-non-string '0' --null-string '0';
Warning: $HADOOP_HOME is deprecated.
14/11/06 02:04:29 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
14/11/06 02:04:30 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
14/11/06 02:04:30 INFO tool.CodeGenTool: Beginning code generation
14/11/06 02:07:40 ERROR manager.SqlManager: Error executing statement: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure
The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server.
com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure
The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server.
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at com.mysql.jdbc.Util.handleNewInstance(Util.java:411)
at com.mysql.jdbc.SQLError.createCommunicationsException(SQLError.java:1117)
at com.mysql.jdbc.MysqlIO.<init>(MysqlIO.java:355)
at com.mysql.jdbc.ConnectionImpl.coreConnect(ConnectionImpl.java:2461)
at com.mysql.jdbc.ConnectionImpl.connectOneTryOnly(ConnectionImpl.java:2498)
at com.mysql.jdbc.ConnectionImpl.createNewIO(ConnectionImpl.java:2283)
at com.mysql.jdbc.ConnectionImpl.<init>(ConnectionImpl.java:822)
at com.mysql.jdbc.JDBC4Connection.<init>(JDBC4Connection.java:47)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at com.mysql.jdbc.Util.handleNewInstance(Util.java:411)
at com.mysql.jdbc.ConnectionImpl.getInstance(ConnectionImpl.java:404)
at com.mysql.jdbc.NonRegisteringDriver.connect(NonRegisteringDriver.java:317)
at java.sql.DriverManager.getConnection(DriverManager.java:582)
at java.sql.DriverManager.getConnection(DriverManager.java:185)
at org.apache.sqoop.manager.SqlManager.makeConnection(SqlManager.java:745)
at org.apache.sqoop.manager.GenericJdbcManager.getConnection(GenericJdbcManager.java:52)
at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:605)
at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:628)
at org.apache.sqoop.manager.SqlManager.getColumnTypesForRawQuery(SqlManager.java:235)
at org.apache.sqoop.manager.SqlManager.getColumnTypes(SqlManager.java:219)
at org.apache.sqoop.manager.ConnManager.getColumnTypes(ConnManager.java:283)
at org.apache.sqoop.orm.ClassWriter.getColumnTypes(ClassWriter.java:1255)
at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1072)
at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:82)
at org.apache.sqoop.tool.ExportTool.exportTable(ExportTool.java:64)
at org.apache.sqoop.tool.ExportTool.run(ExportTool.java:100)
at org.apache.sqoop.Sqoop.run(Sqoop.java:145)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:181)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:220)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:229)
at org.apache.sqoop.Sqoop.main(Sqoop.java:238)
Caused by: java.net.ConnectException: Connection timed out
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351)
at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:213)
at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
at java.net.Socket.connect(Socket.java:529)
at java.net.Socket.connect(Socket.java:478)
at java.net.Socket.<init>(Socket.java:375)
at java.net.Socket.<init>(Socket.java:218)
at com.mysql.jdbc.StandardSocketFactory.connect(StandardSocketFactory.java:259)
at com.mysql.jdbc.MysqlIO.<init>(MysqlIO.java:305)
... 32 more
5 mysql停止服务
14/11/06 04:55:25 DEBUG tool.BaseSqoopTool: Enabled debug logging.
14/11/06 04:55:25 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
14/11/06 04:55:25 DEBUG sqoop.ConnFactory: Loaded manager factory: com.cloudera.sqoop.manager.DefaultManagerFactory
14/11/06 04:55:25 DEBUG sqoop.ConnFactory: Trying ManagerFactory: com.cloudera.sqoop.manager.DefaultManagerFactory
14/11/06 04:55:25 DEBUG manager.DefaultManagerFactory: Trying with scheme: jdbc:mysql:
14/11/06 04:55:25 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
14/11/06 04:55:25 DEBUG sqoop.ConnFactory: Instantiated ConnManager org.apache.sqoop.manager.MySQLManager@8a0d5d
14/11/06 04:55:25 INFO tool.CodeGenTool: Beginning code generation
14/11/06 04:55:26 DEBUG manager.SqlManager: No connection paramenters specified. Using regular API for making connection.
14/11/06 04:55:27 ERROR manager.SqlManager: Error executing statement: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure
The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server.
com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure
The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server.
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at com.mysql.jdbc.Util.handleNewInstance(Util.java:411)
at com.mysql.jdbc.SQLError.createCommunicationsException(SQLError.java:1117)
at com.mysql.jdbc.MysqlIO.<init>(MysqlIO.java:355)
at com.mysql.jdbc.ConnectionImpl.coreConnect(ConnectionImpl.java:2461)
at com.mysql.jdbc.ConnectionImpl.connectOneTryOnly(ConnectionImpl.java:2498)
at com.mysql.jdbc.ConnectionImpl.createNewIO(ConnectionImpl.java:2283)
at com.mysql.jdbc.ConnectionImpl.<init>(ConnectionImpl.java:822)
at com.mysql.jdbc.JDBC4Connection.<init>(JDBC4Connection.java:47)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at com.mysql.jdbc.Util.handleNewInstance(Util.java:411)
at com.mysql.jdbc.ConnectionImpl.getInstance(ConnectionImpl.java:404)
at com.mysql.jdbc.NonRegisteringDriver.connect(NonRegisteringDriver.java:317)
at java.sql.DriverManager.getConnection(DriverManager.java:582)
at java.sql.DriverManager.getConnection(DriverManager.java:185)
at org.apache.sqoop.manager.SqlManager.makeConnection(SqlManager.java:745)
at org.apache.sqoop.manager.GenericJdbcManager.getConnection(GenericJdbcManager.java:52)
at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:605)
at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:628)
at org.apache.sqoop.manager.SqlManager.getColumnTypesForRawQuery(SqlManager.java:235)
at org.apache.sqoop.manager.SqlManager.getColumnTypes(SqlManager.java:219)
at org.apache.sqoop.manager.ConnManager.getColumnTypes(ConnManager.java:283)
at org.apache.sqoop.orm.ClassWriter.getColumnTypes(ClassWriter.java:1255)
at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1072)
at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:82)
at org.apache.sqoop.tool.ExportTool.exportTable(ExportTool.java:64)
at org.apache.sqoop.tool.ExportTool.run(ExportTool.java:100)
at org.apache.sqoop.Sqoop.run(Sqoop.java:145)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:181)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:220)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:229)
at org.apache.sqoop.Sqoop.main(Sqoop.java:238)
Caused by: java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351)
at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:213)
at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
at java.net.Socket.connect(Socket.java:529)
at java.net.Socket.connect(Socket.java:478)
at java.net.Socket.<init>(Socket.java:375)
at java.net.Socket.<init>(Socket.java:218)
at com.mysql.jdbc.StandardSocketFactory.connect(StandardSocketFactory.java:259)
at com.mysql.jdbc.MysqlIO.<init>(MysqlIO.java:305)
... 32 more
14/11/06 04:55:27 DEBUG manager.SqlManager: No connection paramenters specified. Using regular API for making connection.
14/11/06 04:55:27 ERROR manager.CatalogQueryManager: Failed to list columns from query: SELECT COLUMN_NAME FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_SCHEMA = (SELECT SCHEMA()) AND TABLE_NAME = 'dm_trlog'
com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure
The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server.
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at com.mysql.jdbc.Util.handleNewInstance(Util.java:411)
at com.mysql.jdbc.SQLError.createCommunicationsException(SQLError.java:1117)
at com.mysql.jdbc.MysqlIO.<init>(MysqlIO.java:355)
at com.mysql.jdbc.ConnectionImpl.coreConnect(ConnectionImpl.java:2461)
at com.mysql.jdbc.ConnectionImpl.connectOneTryOnly(ConnectionImpl.java:2498)
at com.mysql.jdbc.ConnectionImpl.createNewIO(ConnectionImpl.java:2283)
at com.mysql.jdbc.ConnectionImpl.<init>(ConnectionImpl.java:822)
at com.mysql.jdbc.JDBC4Connection.<init>(JDBC4Connection.java:47)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at com.mysql.jdbc.Util.handleNewInstance(Util.java:411)
at com.mysql.jdbc.ConnectionImpl.getInstance(ConnectionImpl.java:404)
at com.mysql.jdbc.NonRegisteringDriver.connect(NonRegisteringDriver.java:317)
at java.sql.DriverManager.getConnection(DriverManager.java:582)
at java.sql.DriverManager.getConnection(DriverManager.java:185)
at org.apache.sqoop.manager.SqlManager.makeConnection(SqlManager.java:745)
at org.apache.sqoop.manager.GenericJdbcManager.getConnection(GenericJdbcManager.java:52)
at org.apache.sqoop.manager.CatalogQueryManager.getColumnNames(CatalogQueryManager.java:147)
at org.apache.sqoop.orm.ClassWriter.getColumnNames(ClassWriter.java:1222)
at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1074)
at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:82)
at org.apache.sqoop.tool.ExportTool.exportTable(ExportTool.java:64)
at org.apache.sqoop.tool.ExportTool.run(ExportTool.java:100)
at org.apache.sqoop.Sqoop.run(Sqoop.java:145)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:181)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:220)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:229)
at org.apache.sqoop.Sqoop.main(Sqoop.java:238)
Caused by: java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351)
at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:213)
at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
at java.net.Socket.connect(Socket.java:529)
at java.net.Socket.connect(Socket.java:478)
at java.net.Socket.<init>(Socket.java:375)
at java.net.Socket.<init>(Socket.java:218)
at com.mysql.jdbc.StandardSocketFactory.connect(StandardSocketFactory.java:259)
at com.mysql.jdbc.MysqlIO.<init>(MysqlIO.java:305)
... 28 more
14/11/06 04:55:27 ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.RuntimeException: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure
The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server.
java.lang.RuntimeException: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure
6 mysql utf8编码只是3个字节,可能是因为某些unicode字符转成utf8之后变成了4个字节,需要mysql支持utf8mb4
Warning: /usr/lib/hcatalog does not exist! HCatalog jobs will fail. Please set $HCAT_HOME to the root of your HCatalog installation. 14/11/09 07:00:33 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 14/11/09 07:00:33 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. 14/11/09 07:00:33 INFO tool.CodeGenTool: Beginning code generation 14/11/09 07:00:33 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `dm_go_snger` AS t LIMIT 1 14/11/09 07:00:34 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `dm_go_snger` AS t LIMIT 1 14/11/09 07:00:34 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/hadoop/hadoop/hadoop-1.1.2 Note: /tmp/sqoop-hadoop/compile/211b9679d4ac771d3ff710cbbd1c7277/dm_go_snger.java uses or overrides a deprecated API. Note: Recompile with -Xlint:deprecation for details. 14/11/09 07:00:35 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/211b9679d4ac771d3ff710cbbd1c7277/dm_go_snger.jar 14/11/09 07:00:35 INFO mapreduce.ExportJobBase: Beginning export of dm_go_snger 14/11/09 07:00:36 INFO input.FileInputFormat: Total input paths to process : 1 14/11/09 07:00:36 INFO input.FileInputFormat: Total input paths to process : 1 14/11/09 07:00:36 WARN snappy.LoadSnappy: Snappy native library is available 14/11/09 07:00:36 INFO util.NativeCodeLoader: Loaded the native-hadoop library 14/11/09 07:00:36 INFO snappy.LoadSnappy: Snappy native library loaded 14/11/09 07:00:37 INFO mapred.JobClient: Running job: job_201408301703_121362 14/11/09 07:00:38 INFO mapred.JobClient: map 0% reduce 0% 14/11/09 07:00:52 INFO mapred.JobClient: Task Id : attempt_201408301703_121362_m_000000_0, Status : FAILED java.io.IOException: Can't export data, please check task tracker logs at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112) at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149) at org.apache.hadoop.mapred.Child.main(Child.java:249) Caused by: java.io.IOException: java.sql.SQLException: Incorrect string value: 'xF3x90x8Cx92xEFxBF...' for column 'singer' at row 1 at org.apache.sqoop.mapreduce.AsyncSqlRecordWriter.write(AsyncSqlRecordWriter.java:220) at org.apache.sqoop.mapreduce.AsyncSqlRecordWriter.write(AsyncSqlRecordWriter.java:46) at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:639) at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80) at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:84) ... 10 more Caused by: java.sql.SQLException: Incorrect string value: 'xF3x90x8Cx92xEFxBF...' for column 'singer' at row 1 at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1072) at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3563) at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3495) at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1959) at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2113) at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2693) at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:2102) at com.mysql.jdbc.PreparedStatement.execute(PreparedStatement.java:1364) at org.apache.sqoop.mapreduce.AsyncSqlOutputFormat$AsyncSqlExecThread.run(AsyncSqlOutputFormat.java:233) 14/11/09 07:00:59 INFO mapred.JobClient: Task Id : attempt_201408301703_121362_m_000000_1, Status : FAILED java.io.IOException: Can't export data, please check task tracker logs at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112) at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149) at org.apache.hadoop.mapred.Child.main(Child.java:249) Caused by: java.io.IOException: java.sql.SQLException: Incorrect string value: 'xF3x90x8Cx92xEFxBF...' for column 'singer' at row 1 at org.apache.sqoop.mapreduce.AsyncSqlRecordWriter.write(AsyncSqlRecordWriter.java:220) at org.apache.sqoop.mapreduce.AsyncSqlRecordWriter.write(AsyncSqlRecordWriter.java:46) at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:639) at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80) at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:84) ... 10 more Caused by: java.sql.SQLException: Incorrect string value: 'xF3x90x8Cx92xEFxBF...' for column 'singer' at row 1 at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1072) at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3563) at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3495) at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1959) at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2113) at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2693) at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:2102) at com.mysql.jdbc.PreparedStatement.execute(PreparedStatement.java:1364) at org.apache.sqoop.mapreduce.AsyncSqlOutputFormat$AsyncSqlExecThread.run(AsyncSqlOutputFormat.java:233) 14/11/09 07:01:07 INFO mapred.JobClient: Task Id : attempt_201408301703_121362_m_000000_2, Status : FAILED java.io.IOException: Can't export data, please check task tracker logs at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112) at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149) at org.apache.hadoop.mapred.Child.main(Child.java:249) Caused by: java.io.IOException: java.sql.SQLException: Incorrect string value: 'xF3x90x8Cx92xEFxBF...' for column 'singer' at row 1 at org.apache.sqoop.mapreduce.AsyncSqlRecordWriter.write(AsyncSqlRecordWriter.java:220) at org.apache.sqoop.mapreduce.AsyncSqlRecordWriter.write(AsyncSqlRecordWriter.java:46) at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:639) at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80) at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:84) ... 10 more Caused by: java.sql.SQLException: Incorrect string value: 'xF3x90x8Cx92xEFxBF...' for column 'singer' at row 1 at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1072) at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3563) at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3495) at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1959) at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2113) at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2693) at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:2102) at com.mysql.jdbc.PreparedStatement.execute(PreparedStatement.java:1364) at org.apache.sqoop.mapreduce.AsyncSqlOutputFormat$AsyncSqlExecThread.run(AsyncSqlOutputFormat.java:233) 14/11/09 07:01:20 INFO mapred.JobClient: Job complete: job_201408301703_121362 14/11/09 07:01:20 INFO mapred.JobClient: Counters: 8 14/11/09 07:01:20 INFO mapred.JobClient: Job Counters 14/11/09 07:01:20 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=31500 14/11/09 07:01:20 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 14/11/09 07:01:20 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 14/11/09 07:01:20 INFO mapred.JobClient: Rack-local map tasks=3 14/11/09 07:01:20 INFO mapred.JobClient: Launched map tasks=4 14/11/09 07:01:20 INFO mapred.JobClient: Data-local map tasks=1 14/11/09 07:01:20 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0 14/11/09 07:01:20 INFO mapred.JobClient: Failed map tasks=1 14/11/09 07:01:20 INFO mapreduce.ExportJobBase: Transferred 0 bytes in 44.4052 seconds (0 bytes/sec) 14/11/09 07:01:20 INFO mapreduce.ExportJobBase: Exported 0 records. 14/11/09 07:01:20 ERROR tool.ExportTool: Error during export: Export job failed! WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please use org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties files. Logging initialized using configuration in jar:file:/home/hadoop/hadoop/hive-0.10.0.20140629/lib/hive-common-0.10.0.jar!/hive-log4j.properties Hive history file=/tmp/hadoop/hive_job_log_hadoop_201411090701_1423933997.txt
7 Sqoop调式信息
增加关键字--verbose
sqoop export --connect jdbc:mysql://192.168.119.129:3306/student?characterEncoding=utf8 --username li72 --password 123 --verbose --table dm_trlog --export-dir /home/bigdata/hive/data/db1.db/trlog --input-fields-terminated-by '\t' --null-non-string '0' --null-string '0';
8 修改生成的Java类,重新打包。
每次通过sqoop导入MySql的时,都会生成一个以MySql表命名的.java文件,然后打成JAR包,给sqoop提交给hadoop 的MR来解析Hive表中的数据。那可以根据报的错误,找到对应的行,改写该文件,编译,重新打包,sqoop可以通过 -jar-file ,--class-name 组合让我们指定运行自己的jar包中的某个class。来解析该hive表中的每行数据。脚本如下:一个完整的例子如下:
./bin/sqoop export --connect "jdbc:mysql://192.168.119.129:3306/student?useUnicode=true&characterEncoding=utf-8" --username li72 --password 123 --table dm_trlog --export-dir /hive/warehouse/trlog --input-fields-terminated-by '\t' --input-null-string '\\N' --input-null-non-string '\\N' --class-name com.li72.trlog --jar-file /tmp/sqoopTempjar/trlog.jar 上面--jar-file 参数指定jar包的路径。--class-name 指定jar包中的class。 这样就可以解决所有解析异常了。Sqoop命令行说明
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 | Common arguments: --connect < jdbc-uri > Specify JDBC connect string --connection-manager < class-name > Specify connection manager class name --connection-param-file < properties-file > Specify connection parameters file --driver < class-name > Manually specify JDBC driver class to use --hadoop-home < hdir > Override $HADOOP_MAPRED_HOME_ARG --hadoop-mapred-home < dir > Override $HADOOP_MAPRED_HOME_ARG --help Print usage instructions -P Read password from console --password < password > Set authentication password --password-file < password-file > Set authentication password file path --relaxed-isolation Use read-uncommitted isolation for imports --skip-dist-cache Skip copying jars to distributed cache --username < username > Set authentication username --verbose Print more information while working Import control arguments: --append Imports data in append mode --as-avrodatafile Imports data to Avro data files --as-sequencefile Imports data to SequenceFile s --as-textfile Imports data as plain text (default) --boundary-query < statement > Set boundary query for retrieving max and min value of the primary key --columns < col ,col,col...> Columns to import from table --compression-codec < codec > Compression codec to use for import --delete-target-dir Imports data in delete mode --direct Use direct import fast path --direct-split-size < n > Split the input stream every 'n' bytes when importing in direct mode -e,--query < statement > Import results of SQL 'statement' --fetch-size < n > Set number 'n' of rows to fetch from the database when more rows are needed --inline-lob-limit < n > Set the maximum size for an inline LOB -m,--num-mappers < n > Use 'n' map tasks to import in parallel --mapreduce-job-name < name > Set name for generated mapreduce job --split-by < column-name > Column of the table used to split work units --table < table-name > Table to read --target-dir < dir > HDFS plain table destination --validate Validate the copy using the configured validator --validation-failurehandler < validation-failurehandler > Fully qualified class name for ValidationFa ilureHandler --validation-threshold < validation-threshold > Fully qualified class name for ValidationTh reshold --validator < validator > Fully qualified class name for the Validator --warehouse-dir < dir > HDFS parent for table destination --where < where clause> WHERE clause to use during import -z,--compress Enable compression Incremental import arguments: --check-column < column > Source column to check for incremental change --incremental < import-type > Define an incremental import of type 'append' or 'lastmodified' --last-value < value > Last imported value in the incremental check column Output line formatting arguments: --enclosed-by < char > Sets a required field enclosing character --escaped-by < char > Sets the escape character --fields-terminated-by < char > Sets the field separator character --lines-terminated-by < char > Sets the end-of-line character --mysql-delimiters Uses MySQL's default delimiter set: fields: , lines: \n escaped-by: \ optionally-enclosed-by: ' --optionally-enclosed-by < char > Sets a field enclosing character Input parsing arguments: --input-enclosed-by < char > Sets a required field encloser --input-escaped-by < char > Sets the input escape character --input-fields-terminated-by < char > Sets the input field separator --input-lines-terminated-by < char > Sets the input end-of-line char --input-optionally-enclosed-by < char > Sets a field enclosing character Hive arguments: --create-hive-table Fail if the target hive table exists --hive-database < database-name > Sets the database name to use when importing to hive --hive-delims-replacement < arg > Replace Hive record \0x01 and row delimiters (\n\r) from imported string fields with user-defined string --hive-drop-import-delims Drop Hive record \0x01 and row delimiters (\n\r) from imported string fields --hive-home < dir > Override $HIVE_HOME --hive-import Import tables into Hive (Uses Hive's default delimiters if none are set.) --hive-overwrite Overwrite existing data in the Hive table --hive-partition-key < partition-key > Sets the partition key to use when importing to hive --hive-partition-value < partition-value > Sets the partition value to use when importing to hive --hive-table < table-name > Sets the table name to use when importing to hive --map-column-hive < arg > Override mapping for specific column to hive types. HBase arguments: --column-family < family > Sets the target column family for the import --hbase-bulkload Enables HBase bulk loading --hbase-create-table If specified, create missing HBase tables --hbase-row-key < col > Specifies which input column to use as the row key --hbase-table < table > Import to < table > in HBase HCatalog arguments: --hcatalog-database < arg > HCatalog database name --hcatalog-home < hdir > Override $HCAT_HOME --hcatalog-table < arg > HCatalog table name --hive-home < dir > Override $HIVE_HOME --hive-partition-key < partition-key > Sets the partition key to use when importing to hive --hive-partition-value < partition-value > Sets the partition value to use when importing to hive --map-column-hive < arg > Override mapping for specific column to hive types. HCatalog import specific options: --create-hcatalog-table Create HCatalog before import --hcatalog-storage-stanza < arg > HCatalog storage stanza for table creation Accumulo arguments: --accumulo-batch-size < size > Batch size in bytes --accumulo-column-family < family > Sets the target column family for the import --accumulo-create-table If specified, create missing Accumulo tables --accumulo-instance < instance > Accumulo instance name. --accumulo-max-latency < latency > Max write latency in milliseconds --accumulo-password < password > Accumulo password. --accumulo-row-key < col > Specifies which input column to use as the row key --accumulo-table < table > Import to < table > in Accumulo --accumulo-user < user > Accumulo user name. --accumulo-visibility < vis > Visibility token to be applied to all rows imported --accumulo-zookeepers < zookeepers > Comma-separated list of zookeepers (host:port) Code generation arguments: --bindir < dir > Output directory for compiled objects --class-name < name > Sets the generated class name. This overrides --package-name. When combined with --jar-file, sets the input class. --input-null-non-string < null-str > Input null non-string representation --input-null-string < null-str > Input null string representation --jar-file < file > Disable code generation; use specified jar --map-column-java < arg > Override mapping for specific columns to java types --null-non-string < null-str > Null non-string representation --null-string < null-str > Null string representation --outdir < dir > Output directory for generated code --package-name < name > Put auto-generated classes in this package |