[jira] [Commented] (SQOOP-2811) Sqoop2: Extracting sequence files may result in duplicates

Sqoop QA bot (JIRA) Fri, 29 Jan 2016 17:12:48 -0800

    [ 
https://issues.apache.org/jira/browse/SQOOP-2811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15124564#comment-15124564
 ]


Sqoop QA bot commented on SQOOP-2811:
-------------------------------------

Testing file 
[SQOOP-2811.patch|https://issues.apache.org/jira/secure/attachment/12785295/SQOOP-2811.patch]
 against branch sqoop2 took 1:05:57.655409.

{color:red}Overall:{color} -1 due to an error(s), see details below:

{color:green}SUCCESS:{color} Clean was successful
{color:green}SUCCESS:{color} Patch applied correctly
{color:red}ERROR:{color} Patch does not add/modify any test case
{color:green}SUCCESS:{color} License check passed
{color:green}SUCCESS:{color} Patch compiled
{color:green}SUCCESS:{color} All unit tests passed (executed 1676 tests)
{color:orange}WARNING:{color} Test coverage has decreased 
([report|https://builds.apache.org/job/PreCommit-SQOOP-Build/2153/artifact/patch-process/cobertura_report.txt])
* Package {{connector/connector-hdfs}} has lower test coverage: Line coverage 
decreased by 5% (from 80% to 75%), Branch coverage decreased by 0% (from 59% to 
59%)


{color:green}SUCCESS:{color} No new findbugs warnings 
([report|https://builds.apache.org/job/PreCommit-SQOOP-Build/2153/artifact/patch-process/findbugs_report.txt])
{color:green}SUCCESS:{color} All integration tests passed (executed 190 tests)

Console output is available 
[here|https://builds.apache.org/job/PreCommit-SQOOP-Build/2153/console].

This message is automatically generated.

> Sqoop2: Extracting sequence files may result in duplicates
> ----------------------------------------------------------
>
>                 Key: SQOOP-2811
>                 URL: https://issues.apache.org/jira/browse/SQOOP-2811
>             Project: Sqoop
>          Issue Type: Bug
>    Affects Versions: 1.99.6
>            Reporter: Abraham Fine
>            Assignee: Abraham Fine
>         Attachments: SQOOP-2811.patch
>
>
> In the hdfs extractor we use:
> {code:java}
>     if (start > filereader.getPosition()) {
>       filereader.sync(start); // sync to start
>     }
> {code}
> to jump to the correct point in the sequence file that we want to extract.
> If the sequence file is small, multiple start points may `sync` to the same 
> point and we could end up extracting the same record multiple times.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (SQOOP-2811) Sqoop2: Extracting sequence files may result in duplicates

Reply via email to