Re: [PR] [SPARK-51272][CORE]. Fix for the race condition in Scheduler causing failure in retrying all partitions in case of indeterministic shuffle keys [spark]

via GitHub Thu, 10 Apr 2025 12:15:14 -0700


ahshahid commented on PR #50033:
URL: https://github.com/apache/spark/pull/50033#issuecomment-2784748105


   Will get back to you .. I may have messed up with the bug name..
   This is the 3rd issue which was identified.. 
   Let me explain the issue:
   I think this is the issue, which results in data addition rather than loss:
   The idea of the code change and behaviour is this:
   If the result stage is dependent on  a determinate and indeterminate stage,  
and the first task which fails is due to a determinate stage, then even though 
the failing shuffle stage is determinate,  **still the code should retry all 
partitions of both the determinate and indeterminate shuffle stage.**
   Because it is not known at that point, of first result task failure, whether 
any partition of inDeterminate Shuffle stage is also lost or not.  
   If its lost, and we accept any subsequent successful result task, it is 
going to give wrong results.
   I will go through the code again, to validate that .
   and for the same reason as above, if first result task is successful and 
second task fails due to determinate shuffle stage, the query should get 
aborted.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51272][CORE]. Fix for the race condition in Scheduler causing failure in retrying all partitions in case of indeterministic shuffle keys [spark]

Reply via email to