Re: [PR] [SPARK-51272][CORE] Aborting instead of re-submitting of partially completed indeterminate result stage [spark]

via GitHub Thu, 17 Apr 2025 22:29:29 -0700


mridulm commented on PR #50630:
URL: https://github.com/apache/spark/pull/50630#issuecomment-2814578661

> It can use a FileOutputCommitter where each task does a Hadoop task
commit. In case of a re-submit this will lead to re-commit that Hadoop task
(possibly with different content)

Only unsuccessful (and so uncommitted) tasks are candidates for
(re)execution (and so commit) - not completed tasks.
So if a partition has completed task commit, it wont be reexecuted - spark
ensures this w.r.t use of `FileOutputCommitter`

> In case of JDBC write it can already inserted all rows of a partitions
into the target schema.

As discussed
[here](https://github.com/apache/spark/pull/50033#issuecomment-2808115624),
this is a bug in jdbc implementation - the txn commit should be done in a task
commit, not as part of `foreachPartition(savePartition)`.
It is not expected to work correctly in all scenarios, and in the case
observed, it did end up failing.

> But this is not enough as a FetchFailed from a determinate stage can lead
to an executor loss and a re-compute of the indeterminate parent of the result
stage as shown in the attached unittest.

The fix for this is to handle something similar [to
this](https://github.com/apache/spark/blob/8b33a832b2ba4f8bb7ed34dac50778bf8cbcfa13/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L2131).

I have sketched a rough impl here for reference (it is just illustrative !
and to convey what I was talking about).
Assuming I am not missing anything, it should help you with this fixing this
issue.

* [Option
1](https://github.com/apache/spark/compare/master...mridulm:spark:prototype-for-attila-indeterminism-discussion-option-1)
handles indeterminate impact when processing shuffle data loss.
* [Option
2](https://github.com/apache/spark/compare/master...mridulm:spark:prototype-for-attila-indeterminism-discussion-option-2-subMissingTasks)
does this when computing an indeterminate stage.

Option 1 is much more aggressive with cleanup, but might spuriously kills
jobs a lot more than required.
If option 2 is correct, I would prefer that - as it is much more
conservative with abort'ing stages/failing jobs.

(I have adapted the tests you included in this PR for both)

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51272][CORE] Aborting instead of re-submitting of partially completed indeterminate result stage [spark]

Reply via email to