[ 
https://issues.apache.org/jira/browse/SQOOP-3471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17284403#comment-17284403
 ] 

jw commented on SQOOP-3471:
---------------------------

Hello, it is suggested that you can use parameter "-D 
mapreduce.map.failures.maxpercent=0" to solve the problem of single map task 
failure,  *it will cause the export job to fail.* and then clean up the data 
that has been imported into DB from the high level after failure, so as to 
ensure the idempotency of the application layer. 

> While doing sqoop-export mapper progress goes back causing duplicated data
> --------------------------------------------------------------------------
>
>                 Key: SQOOP-3471
>                 URL: https://issues.apache.org/jira/browse/SQOOP-3471
>             Project: Sqoop
>          Issue Type: Bug
>    Affects Versions: 1.4.6
>            Reporter: Ruben Agudo
>            Priority: Major
>         Attachments: image-2020-04-21-10-36-15-108.png
>
>
> We are running the sqoop-export tool in Qubole, to export some data from S3 
> back to an SQL Server Database.
> Our issue is that sometimes, one of the mappers of the mapping part seem that 
> fail/restart or something. basically we see the progress going back like in 
> the following image:
> !image-2020-04-21-10-36-15-108.png!
> This is causing duplicates in our destination table. I'm a bit lost because 
> in the documentation it says that *"If an export map task fails due to these 
> or other reasons, it will cause the export job to fail."* and this is not the 
> behaviour we are seeing.
> Unfortunately we can't duplicate it in a consistent manner.
> The command that we are running is:
> sqoop export 
>  -Dsqoop.export.records.per.statement=50000 
>  -Dsqoop.export.statements.per.transaction=100 
>  -Dsqoop.throwOnError=1 
>  --connection-manager org.apache.sqoop.manager.SQLServerManager 
>  --driver com.microsoft.sqlserver.jdbc.SQLServerDriver 
>  --connect connectionString 
>  --table config.table 
>  --export-dir config.source
>  --input-fields-terminated-by ,
>  --num-mappers 8
>  --columns theColumnsToCopy
>  --batch
>  --schema theSchema
> I removed the things that I can't add for privacy reasons.
> And the table we want to export contains 237,371,726 records.
> What could be the cause of the mapper going back in progress? And, if that 
> happens, is it possible to make the sqoop export fail?
> Also, if this isn't the correct channel for this, please let me know.
> Thanks!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to