[ https://issues.apache.org/jira/browse/SQOOP-2811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15126953#comment-15126953 ]
Hudson commented on SQOOP-2811: ------------------------------- SUCCESS: Integrated in Sqoop2 #1001 (See [https://builds.apache.org/job/Sqoop2/1001/]) SQOOP-2811: Sqoop2: Extracting sequence files may result in duplicates (jarcec: [https://git-wip-us.apache.org/repos/asf?p=sqoop.git&a=commit&h=118aa7c4f9cb7ed3a81ce792e7bf56d31f9107e5]) * connector/connector-hdfs/src/main/java/org/apache/sqoop/connector/hdfs/HdfsExtractor.java * connector/connector-hdfs/src/main/java/org/apache/sqoop/connector/hdfs/SqoopTaskAttemptContext.java * connector/connector-hdfs/src/main/java/org/apache/sqoop/connector/hdfs/HdfsPartition.java > Sqoop2: Extracting sequence files may result in duplicates > ---------------------------------------------------------- > > Key: SQOOP-2811 > URL: https://issues.apache.org/jira/browse/SQOOP-2811 > Project: Sqoop > Issue Type: Bug > Affects Versions: 1.99.6 > Reporter: Abraham Fine > Assignee: Abraham Fine > Fix For: 1.99.7 > > Attachments: SQOOP-2811.patch > > > In the hdfs extractor we use: > {code:java} > if (start > filereader.getPosition()) { > filereader.sync(start); // sync to start > } > {code} > to jump to the correct point in the sequence file that we want to extract. > If the sequence file is small, multiple start points may `sync` to the same > point and we could end up extracting the same record multiple times. -- This message was sent by Atlassian JIRA (v6.3.4#6332)