[ https://issues.apache.org/jira/browse/SQOOP-3471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17284403#comment-17284403 ]
jw commented on SQOOP-3471: --------------------------- Hello, it is suggested that you can use parameter "-D mapreduce.map.failures.maxpercent=0" to solve the problem of single map task failure, *it will cause the export job to fail.* and then clean up the data that has been imported into DB from the high level after failure, so as to ensure the idempotency of the application layer. > While doing sqoop-export mapper progress goes back causing duplicated data > -------------------------------------------------------------------------- > > Key: SQOOP-3471 > URL: https://issues.apache.org/jira/browse/SQOOP-3471 > Project: Sqoop > Issue Type: Bug > Affects Versions: 1.4.6 > Reporter: Ruben Agudo > Priority: Major > Attachments: image-2020-04-21-10-36-15-108.png > > > We are running the sqoop-export tool in Qubole, to export some data from S3 > back to an SQL Server Database. > Our issue is that sometimes, one of the mappers of the mapping part seem that > fail/restart or something. basically we see the progress going back like in > the following image: > !image-2020-04-21-10-36-15-108.png! > This is causing duplicates in our destination table. I'm a bit lost because > in the documentation it says that *"If an export map task fails due to these > or other reasons, it will cause the export job to fail."* and this is not the > behaviour we are seeing. > Unfortunately we can't duplicate it in a consistent manner. > The command that we are running is: > sqoop export > -Dsqoop.export.records.per.statement=50000 > -Dsqoop.export.statements.per.transaction=100 > -Dsqoop.throwOnError=1 > --connection-manager org.apache.sqoop.manager.SQLServerManager > --driver com.microsoft.sqlserver.jdbc.SQLServerDriver > --connect connectionString > --table config.table > --export-dir config.source > --input-fields-terminated-by , > --num-mappers 8 > --columns theColumnsToCopy > --batch > --schema theSchema > I removed the things that I can't add for privacy reasons. > And the table we want to export contains 237,371,726 records. > What could be the cause of the mapper going back in progress? And, if that > happens, is it possible to make the sqoop export fail? > Also, if this isn't the correct channel for this, please let me know. > Thanks! -- This message was sent by Atlassian Jira (v8.3.4#803005)