dongjoon-hyun commented on PR #54558: URL: https://github.com/apache/spark/pull/54558#issuecomment-3977139547
> Would it make sense to set spark.task.cpus number of cores in recovery mode and don't disable it if >1? Of course, it sounds better to me because it's the theoretical minimum. We simply need to revise our abstract according to this code. > Yes, I agree that recovering from an OOM is a huge win. My question is mainly about subsequent stages. If there is no resource profile set for them, will/might those stages use the 1 core executor? If yes, then I still consider the PR a nice improvement just we probably need to call out this behaviour in our documentation or in config description so that users could decide whether they want their jobs to fail fast or complete with maybe increased runtime. I understand your requirement of fail-fast. Technically, you want to give the users the right to disable the whole feature, right? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
