ahshahid commented on PR #50033: URL: https://github.com/apache/spark/pull/50033#issuecomment-2784748105
Will get back to you .. I may have messed up with the bug name.. This is the 3rd issue which was identified.. Let me explain the issue: I think this is the issue, which results in data addition rather than loss: The idea of the code change and behaviour is this: If the result stage is dependent on a determinate and indeterminate stage, and the first task which fails is due to a determinate stage, then even though the failing shuffle stage is determinate, **still the code should retry all partitions of both the determinate and indeterminate shuffle stage.** Because it is not known at that point, of first result task failure, whether any partition of inDeterminate Shuffle stage is also lost or not. If its lost, and we accept any subsequent successful result task, it is going to give wrong results. I will go through the code again, to validate that . and for the same reason as above, if first result task is successful and second task fails due to determinate shuffle stage, the query should get aborted. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org