Jarek Jarcec Cecho created SQOOP-2055:
-----------------------------------------

             Summary: Run only one map task attempt during export
                 Key: SQOOP-2055
                 URL: https://issues.apache.org/jira/browse/SQOOP-2055
             Project: Sqoop
          Issue Type: Bug
    Affects Versions: 1.4.5
            Reporter: Jarek Jarcec Cecho
            Assignee: Jarek Jarcec Cecho
             Fix For: 1.4.6


While investigating several user issues, I've noticed that our [documentation 
is stating that on export mapper failure we fail the entire 
job|http://sqoop.apache.org/docs/1.4.5/SqoopUserGuide.html#_failed_exports]:

{quote}
If an export map task fails due to these or other reasons, it will cause the 
export job to fail. The results of a failed export are undefined. Each export 
map task operates in a separate transaction. Furthermore, individual map tasks 
commit their current transaction periodically. If a task fails, the current 
transaction will be rolled back. Any previously-committed transactions will 
remain durable in the database, leading to a partially-complete export.
{quote}

This is however not the observed behavior as mapreduce will re-run failed 
mapper again (up to 3 times) before failing the job. This is confusing while 
investigating failures because most often one have to go to the first failed 
attempt and ignore the rest as they are usually failing on unrelated issues 
(key constraints).

It seems that some of the connectors are smart enough to either suggest user to 
configure MR or do it automatically 
([PGDump|https://github.com/apache/sqoop/blob/trunk/src/java/org/apache/sqoop/mapreduce/postgresql/PGBulkloadExportJob.java#L139],
 
[OraOop|https://github.com/apache/sqoop/blob/trunk/src/docs/user/connectors.txt#L831]).
 I would like to propose to apply this behavior on every export job as that 
seem as a more reasonable default for export job.

Doing this might have a side effect on more advanced connectors that have each 
mapper attempt idempotent (e.g. they are using temporary tables per map attempt 
or similar facility) in the sense that we stop re-running their failed attempts 
automatically and those connectors will have to re-enable this behavior on 
their own.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to