[ https://issues.apache.org/jira/browse/SQOOP-3240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16834903#comment-16834903 ]
Ricardo Gaspar commented on SQOOP-3240: --------------------------------------- Hi, I'm having a similar problem when using the direct mode against an Oracle database (based on Oracle 12) and Sqoop 1.4.6-cdh5.14.4. In my case I'm not even specifying any query or where clause, just sqooping the full table. [~infosuresh2k], did you find any solution/fix for this in the meantime? It's been a while since this issue was posted. An obvious but not very efficient solution would be to set mappers to 1 (when no single PK is available to use with the split-by argument). > Hive table Row count mismatch(more records in hadoop/hive than oracle) when > importing data from oracle using sqoop import with direct mode > ------------------------------------------------------------------------------------------------------------------------------------------ > > Key: SQOOP-3240 > URL: https://issues.apache.org/jira/browse/SQOOP-3240 > Project: Sqoop > Issue Type: Bug > Components: connectors/oracle > Affects Versions: 1.4.6 > Environment: Sqoop 1.4.6-cdh5.8.2 > Oracle 11i > oracle table Sqoop import with direct mode > Reporter: Suresh Venkatesan > Priority: Minor > > Hive/Hadoop row count was not matching on sqoop import of oracle table with > direct mode, it's perfectly matching without direct mode. > See the below examples. > 1. Row Count is varying based on fetch-size 10000 and direct mode. > 2. Row count is not consistent, sometimes it is 630869 and sometimes 639924 > etc., > 3. We are pulling data with direct mode for 1 year with monthly partition, > data mismatch happening for couple of months not for all months. > 4. Hive/Hadoop table Count looks good, If we import the data from oracle > without direct mode. > Actual row count in oracle table for year 2015 and month 12 is : 609924, we > are getting more records on sqoop import with direct mode : 630869. hive > table count is matching with oracle count on sqoop import *without direct* > mode : 609924 > *1. sqoop import without direct mode, Row Count : 609924* > sqoop import -Dmapredue.job.queuename=queue > -Doraoop.import.partitions=oracle_table_partition_201512 --connect > jdbc:oracle:thin:@//xxxxx.yyyyy.com:2408/zzzz --username user_id -P --table > oracle_db.oracle_table --split-by split_column --hive-import --hive-table > hive_db.hive_table --where "TO_DATE(oracle_timestamp_column) BETWEEN > TO_DATE('12/01/2015','mm/dd/yyyy') AND TO_DATE('12/31/2015','mm/dd/yyyy')" > --fields-terminated-by '\001' --lines-terminated-by '\n' --as-textfile -m 4 > *2. sqoop import with direct mode, Row Count : 630869* > sqoop import -Dmapredue.job.queuename=queue > -Doraoop.import.partitions=oracle_table_partition_201512 --connect > jdbc:oracle:thin:@//xxxxx.yyyyy.com:2408/zzzz --username user_id -P --table > oracle_db.oracle_table --split-by split_column --hive-import --hive-table > hive_db.hive_table --where "TO_DATE(oracle_timestamp_column) BETWEEN > TO_DATE('12/01/2015','mm/dd/yyyy') AND TO_DATE('12/31/2015','mm/dd/yyyy')" > --fields-terminated-by '\001' --lines-terminated-by '\n' --as-textfile -m 4 > --direct -- This message was sent by Atlassian JIRA (v7.6.3#76005)