dongjoon-hyun commented on PR #54558:
URL: https://github.com/apache/spark/pull/54558#issuecomment-3977139547

   > Would it make sense to set spark.task.cpus number of cores in recovery 
mode and don't disable it if >1?
   
   Of course, it sounds better to me because it's the theoretical minimum. We 
simply need to revise our abstract according to this code.
   
   > Yes, I agree that recovering from an OOM is a huge win.
   My question is mainly about subsequent stages. If there is no resource 
profile set for them, will/might those stages use the 1 core executor? If yes, 
then I still consider the PR a nice improvement just we probably need to call 
out this behaviour in our documentation or in config description so that users 
could decide whether they want their jobs to fail fast or complete with maybe 
increased runtime.
   
   I understand your requirement of fail-fast. Technically, you want to give 
the users the right to disable the whole feature, right?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to