mridulm commented on PR #50630:
URL: https://github.com/apache/spark/pull/50630#issuecomment-2814578661

   > It can use a FileOutputCommitter where each task does a Hadoop task 
commit. In case of a re-submit this will lead to re-commit that Hadoop task 
(possibly with different content)
   
   Only unsuccessful (and so uncommitted) tasks are candidates for 
(re)execution (and so commit) - not completed tasks.
   So if a partition has completed task commit, it wont be reexecuted - spark 
ensures this w.r.t use of `FileOutputCommitter`
   
   > In case of JDBC write it can already inserted all rows of a partitions 
into the target schema.
   
   As discussed 
[here](https://github.com/apache/spark/pull/50033#issuecomment-2808115624), 
this is a bug in jdbc implementation - the txn commit should be done in a task 
commit, not as part of `foreachPartition(savePartition)`.
   It is not expected to work correctly in all scenarios, and in the case 
observed, it did end up failing.
   
   
   > But this is not enough as a FetchFailed from a determinate stage can lead 
to an executor loss and a re-compute of the indeterminate parent of the result 
stage as shown in the attached unittest.
   
   The fix for this is to handle something similar [to 
this](https://github.com/apache/spark/blob/8b33a832b2ba4f8bb7ed34dac50778bf8cbcfa13/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L2131).
   
   
   I have sketched a rough impl here for reference (it is just illustrative ! 
and to convey what I was talking about).
   Assuming I am not missing anything, it should help you with this fixing this 
issue.
   
   * [Option 
1](https://github.com/apache/spark/compare/master...mridulm:spark:prototype-for-attila-indeterminism-discussion-option-1)
 handles indeterminate impact when processing shuffle data loss.
   * [Option 
2](https://github.com/apache/spark/compare/master...mridulm:spark:prototype-for-attila-indeterminism-discussion-option-2-subMissingTasks)
 does this when computing an indeterminate stage.
   
   Option 1 is much more aggressive with cleanup, but might spuriously kills 
jobs a lot more than required.
   If option 2 is correct, I would prefer that - as it is much more 
conservative with abort'ing stages/failing jobs.
   
   (I have adapted the tests you included in this PR for both)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to