Re: [PR] [SPARK-53898][CORE] Shuffle cleanup should not clean MapOutputTrackerMaster.shuffleStatuses in local cluster [spark]

via GitHub Tue, 11 Nov 2025 04:10:01 -0800


Ngone51 commented on PR #52606:
URL: https://github.com/apache/spark/pull/52606#issuecomment-3516567220


   I just realize that we can not clean the shuffle files only while leaving 
the shuffle statuses uncleaned. Because for the case like dataframe queries, it 
is very common to reuse a dataframe accross the quries. After the dataframe is 
executed for the first time, its related shuffle files are all cleaned but the 
shuffle statuses still exists. So when the dataframe is reused to run the 
queries, it would mistakenly think the shuffle files still there given the 
existing shuffle statues but failed at runtime due to the missing shuffle files.
   
   I have pushed a new proposal, which tries to fail the query (usually the 
subquery) when the shuffle is no-longer registered.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-53898][CORE] Shuffle cleanup should not clean MapOutputTrackerMaster.shuffleStatuses in local cluster [spark]

Reply via email to