Ruben Agudo created SQOOP-3471:
----------------------------------

             Summary: While doing sqoop-export mapper progress goes back 
causing duplicated data
                 Key: SQOOP-3471
                 URL: https://issues.apache.org/jira/browse/SQOOP-3471
             Project: Sqoop
          Issue Type: Bug
    Affects Versions: 1.4.6
            Reporter: Ruben Agudo
         Attachments: image-2020-04-21-10-36-15-108.png

We are running the sqoop-export tool in Qubole, to export some data from S3 
back to an SQL Server Database.

Our issue is that sometimes, one of the mappers of the mapping part seem that 
fail/restart or something. basically we see the progress going back like in the 
following image:

!image-2020-04-21-10-36-15-108!

This is causing duplicates in our destination table. I'm a bit lost because in 
the documentation it says that *"If an export map task fails due to these or 
other reasons, it will cause the export job to fail."* and this is not the 
behaviour we are seeing.

Unfortunately we can't duplicate it in a consistent manner.

The command that we are running is:

sqoop export 
 -Dsqoop.export.records.per.statement=50000 
 -Dsqoop.export.statements.per.transaction=100 
 -Dsqoop.throwOnError=1 
 --connection-manager org.apache.sqoop.manager.SQLServerManager 
 --driver com.microsoft.sqlserver.jdbc.SQLServerDriver 
 --connect connectionString 
 --table config.table 
 --export-dir config.source
 --input-fields-terminated-by ,
 --num-mappers 8
 --columns theColumnsToCopy
 --batch
 --schema theSchema

I removed the things that I can't add for privacy reasons.

What could be the cause of the mapper going back in progress? And, if that 
happens, is it possible to make the sqoop export fail?

Also, if this isn't the correct channel for this, please let me know.

Thanks!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to