[ https://issues.apache.org/jira/browse/SQOOP-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16149309#comment-16149309 ]
Markus Kemper commented on SQOOP-3014: -------------------------------------- Adding additional test case (below) with the following comments: * Import into HDFS seems to be working as expected * The hcatalog load phase is not working as noted by others ** The issue does not appear to be specific to HDFS ORC files *Test Case* {noformat} ####################### # STEP 01 - CREATE SQL TABLE AND DATA ####################### export MYCONN=jdbc:oracle:thin:@sqoop.apache.com:1521/db11g export MYUSER=sqoop export MYPSWD=sqoop sqoop list-tables --connect $MYCONN --username $MYUSER --password $MYPSWD sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "drop table sqoop_3014" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "create table sqoop_3014 (c1 integer, c2 decimal(22,5), c3 varchar(40))" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "insert into sqoop_3014 values (1, 454018528782.42006329, '454018528782.42006329')" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "insert into sqoop_3014 values (2, 87658675864540185.123456789123456789, '87658675864540185.123456789123456789')" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "select * from sqoop_3014" ---------------------------------------------------------------------- | C1 | C2 | C3 | ---------------------------------------------------------------------- | 1 | 454018528782.42006 | 454018528782.42006329 | | 2 | 87658675864540185.12346 | 87658675864540185.123456789123456789 | ---------------------------------------------------------------------- ####################### # STEP 02 - IMPORT DATA INTO HDFS (--as-textfile) ####################### sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table SQOOP_3014 --target-dir /user/root/sqoop_3014 --delete-target-dir --num-mappers 1 --as-textfile --verbose hdfs dfs -cat /user/root/sqoop_3014/part* Output: 1,454018528782.42006,454018528782.42006329 2,87658675864540185.12346,87658675864540185.123456789123456789 sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --query "select * from sqoop_3014 where \$CONDITIONS" --target-dir /user/root/sqoop_3014 --delete-target-dir --num-mappers 1 --as-textfile --verbose hdfs dfs -cat /user/root/sqoop_3014/part* Output: 1,454018528782.42006,454018528782.42006329 2,87658675864540185.12346,87658675864540185.123456789123456789 ####################### # STEP 03 - IMPORT DATA INTO HIVE (--hcatalog-storage-stanza "stored as textfile") ####################### beeline -u jdbc:hive2:// -e "use db1; drop table sqoop_3014_text purge;" sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --query "select * from sqoop_3014 where \$CONDITIONS" --hcatalog-database db1 --hcatalog-table sqoop_3014_text --create-hcatalog-table --hcatalog-storage-stanza "stored as textfile" --num-mappers 1 --verbose beeline -u jdbc:hive2:// -e "use db1; select * from sqoop_3014_text; show create table sqoop_3014_text;" Output: +---------------------+---------------------+---------------------------------------+--+ | sqoop_3014_text.c1 | sqoop_3014_text.c2 | sqoop_3014_text.c3 | +---------------------+---------------------+---------------------------------------+--+ | 1 | 454018528782.42004 | 454018528782.42006329 | | 2 | 87658675864540192 | 87658675864540185.123456789123456789 | +---------------------+---------------------+---------------------------------------+--+ ####################### # STEP 04 - IMPORT DATA INTO HIVE (--hcatalog-storage-stanza "stored as orc") ####################### beeline -u jdbc:hive2:// -e "use db1; drop table sqoop_3014_orc purge;" sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --query "select * from sqoop_3014 where \$CONDITIONS" --hcatalog-database db1 --hcatalog-table sqoop_3014_orc --create-hcatalog-table --hcatalog-storage-stanza "stored as orc" --num-mappers 1 --verbose beeline -u jdbc:hive2:// -e "use db1; select * from sqoop_3014_orc; show create table sqoop_3014_orc;" Output: +---------------------+---------------------+---------------------------------------+--+ | sqoop_3014_text.c1 | sqoop_3014_text.c2 | sqoop_3014_text.c3 | +---------------------+---------------------+---------------------------------------+--+ | 1 | 454018528782.42004 | 454018528782.42006329 | | 2 | 87658675864540192 | 87658675864540185.123456789123456789 | +---------------------+---------------------+---------------------------------------+--+ +----------------------------------------------------+--+ | createtab_stmt | +----------------------------------------------------+--+ | CREATE TABLE `sqoop_3014_text`( | | `c1` decimal(38,0), | | `c2` decimal(22,5), | | `c3` varchar(40)) | | ROW FORMAT SERDE | | 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' | | STORED AS INPUTFORMAT | | 'org.apache.hadoop.mapred.TextInputFormat' | | OUTPUTFORMAT | | 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' | | LOCATION | | 'hdfs://nameservice1/data/dbs/db1/sqoop_3014_text' | | TBLPROPERTIES ( | | 'transient_lastDdlTime'='1504198953') | +----------------------------------------------------+--+ {noformat} > Sqoop with HCatalog import loose precision for large numbers that does not > fit into double > ------------------------------------------------------------------------------------------ > > Key: SQOOP-3014 > URL: https://issues.apache.org/jira/browse/SQOOP-3014 > Project: Sqoop > Issue Type: Bug > Components: hive-integration > Affects Versions: 1.4.6 > Reporter: Pavel Benes > Assignee: Venkat Ranganathan > Priority: Critical > Fix For: 1.4.7 > > Attachments: oracle-sqoop-error.png > > > When using sqoop with HCatalog to import data from JDBC (I have tried > Oracle11) all numbers that does not fit into double are loosing its precision > or are distorted. > Steps to reproduce: > 1) Create test table in Oracle and fill it with test data > {code} > CREATE TABLE TEST_SQOOP_ERROR(ID VARCHAR(10), TYPE_NUMBER DECIMAL(22,5)) > > INSERT INTO MMDINGEST.TEST_SQOOP_ERROR(ID, TYPE_NUMBER) VALUES ( 'row-1', > 454018528782.42006329) > INSERT INTO MMDINGEST.TEST_SQOOP_ERROR(ID, TYPE_NUMBER) VALUES ( 'row-2', > 87658675864540185.123456789123456789) > INSERT INTO MMDINGEST.TEST_SQOOP_ERROR(ID, TYPE_NUMBER) VALUES ( 'row-3', > 87658675864540185.12345) > INSERT INTO MMDINGEST.TEST_SQOOP_ERROR(ID, TYPE_NUMBER) VALUES ( 'row-4', > 87658675864540185.123) > INSERT INTO MMDINGEST.TEST_SQOOP_ERROR(ID, TYPE_NUMBER) VALUES ( 'row-5', > 7658675864540185.12345) > INSERT INTO MMDINGEST.TEST_SQOOP_ERROR(ID, TYPE_NUMBER) VALUES ( 'row-6', > 7658675864540185.123456789) > INSERT INTO MMDINGEST.TEST_SQOOP_ERROR(ID, TYPE_NUMBER) VALUES ( 'row-7', > 658675864540185.12345) > INSERT INTO MMDINGEST.TEST_SQOOP_ERROR(ID, TYPE_NUMBER) VALUES ( 'row-8', > 58675864540185.12345) > INSERT INTO MMDINGEST.TEST_SQOOP_ERROR(ID, TYPE_NUMBER) VALUES ( 'row-9', > 8675864540185.12345) > INSERT INTO MMDINGEST.TEST_SQOOP_ERROR(ID, TYPE_NUMBER) VALUES ( 'row-10', > 675864540185.12345) > INSERT INTO MMDINGEST.TEST_SQOOP_ERROR(ID, TYPE_NUMBER) VALUES ( 'row-11', > 75864540185.12345) > INSERT INTO MMDINGEST.TEST_SQOOP_ERROR(ID, TYPE_NUMBER) VALUES ( 'row-12', > 35864540185.12345) > INSERT INTO MMDINGEST.TEST_SQOOP_ERROR(ID, TYPE_NUMBER) VALUES ( 'row-13', > 5864540185.12345) > {code} > 2) Create table in Hive database > {code} > CREATE TABLE pbe_test_sqoop_error(id string, type_number decimal(22,5)) > STORED AS ORC; > {code} > 3) Import data from Oracle to Hive using sqoop > {code} > export HADOOP_CLASSPATH=/opt/mantis/jdbc/oracle-11.2/ojdbc6.jar > sqoop import -connect jdbc:oracle:thin:@//1.1.1.1:1521/XE --username XXX > --password XXX --hcatalog-database default --hcatalog-table > pbe_test_sqoop_error --driver oracle.jdbc.OracleDriver --query 'SELECT id, > type_number FROM MMDINGEST.TEST_SQOOP_ERROR WHERE $CONDITIONS' -m 1 > {code} > 4) Display data from Hive table > {code} > hive> select * from pbe_test_sqoop_error; > OK > row-1 454018528782.42004 > row-2 87658675864540192 > row-3 87658675864540192 > row-4 87658675864540192 > row-5 7658675864540185 > row-6 7658675864540185 > row-7 658675864540185.125 > row-8 58675864540185.125 > row-9 8675864540185.12305 > row-10 675864540185.12341 > row-11 75864540185.12344 > row-12 35864540185.12345 > row-13 5864540185.12345 > Time taken: 0.455 seconds, Fetched: 13 row(s) > {code} > Only the values at line 1, 12, 13 are correct. At the lines 2-4 even the part > of the number before dot is wrong. All looks correctly in Oracle as can be > seen on the attached screenshot. > The problem seems to be in the java class > https://www.codatlas.com/github.com/apache/sqoop/branch-1.4.6/src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatImportHelper.java > starting at line 437. > {code} > } else if (hfsType == HCatFieldSchema.Type.DECIMAL) { > BigDecimal bd = new BigDecimal(n.doubleValue(), > MathContext.DECIMAL128); > return HiveDecimal.create(bd); > } > {code} > all numbers, even those that are stored in BigDecimal are squeezed through > double which leads to the precision lost The same issue could be at some > places when working with large numbers. > The following code fixes this issue: > {code} > } else if (hfsType == HCatFieldSchema.Type.DECIMAL) { > BigDecimal bd = val instanceof BigDecimal ? (BigDecimal) val : new > BigDecimal(n.doubleValue(), MathContext.DECIMAL128); > return HiveDecimal.create(bd); > } > {code} . -- This message was sent by Atlassian JIRA (v6.4.14#64029)