magic s3a commiter resilience to failure during task commit

Dylan McClelland Sat, 20 Apr 2024 15:36:49 -0700

when describing the resilience of the magic committer to failures during a
task commit, the docs state


"If the .pendingset file has been saved to the job attempt directory, the
task has effectively committed, it has just failed to report to the
controller. This will cause complications during job commit, as there may
be two task PendingSet committing the same files, or committing files with

*Proposed*: track task ID in pendingsets, recognise duplicates on load and
then respond by cancelling one set and committing the other. (or fail?)"

As far as I can tell from reading over the code, the proposal was not
implemented. Is this still considered a viable solution? If so, I'd be
happy to take a crack at it

For context: We are running spark jobs using the magic committer in a k8s
environment where executor loss is somewhat common (maybe about 2% of
executors are terminated by the cluster). By default, spark's
OutputCommitCoordinator fails the whole write stage (and parent job) if an
executor fails while it an "authorized committer" (see
https://issues.apache.org/jira/browse/SPARK-39195).

This results in expensive job failures at the final write stage. We can
avoid those failures by disabling the outputCommitCoordinator entirely, but
that leaves us open to the failure mode described above in the s3a docs.

It seems to me that if the magic committer implemented the above proposal
to track task ids in the pendingset, the OutputCommitCoordinator would no
longer be needed at all. I think this ticket (
https://issues.apache.org/jira/browse/SPARK-19790) discusses similar ideas
about how to make spark's commit coordiantor work with the s3a committers

magic s3a commiter resilience to failure during task commit

Reply via email to