azagrebin commented on issue #11408: [FLINK-15989][FLINK-16225] Improve direct and metaspace out-of-memory error handling URL: https://github.com/apache/flink/pull/11408#issuecomment-600217460 Thanks for the review @tillrohrmann As I understand, OOM can generally happen everywhere, especially Metaspace OOM on any class loading, not only in user code. Therefore, my idea was to try to catch it in common places, as you also mentioned, in the most outer exception handling of known threads or propagated to them: - task thread - RPC endpoint thread - network threads I thought that handling in task thread is the most probable place to catch the error but you are right I can look into other places as well. If I do not miss something, `TaskExecutor#submitTask` does not look special comparing to other RPC handlers running in RPC main thread. RPC endpoint could have some central error handling place for this case or I overlooked it. I would assume that it does not make sense for RPC endpoint to survive OOM error but at least to report it to JVM if possible. I can also write tests emulating throwing errors in those known threads and checking that `onFatalError` is properly called. By adding error message to the OOM exception, I suppose you mean either generating another one with now detailed message or wrapping it in another more descriptive OOM exception. This would have to be also done in the same places as logging atm. I somewhat did not want to touch the original error generated by JVM but I think you are right about reporting it to JM. I can change the error message of the error. For the case of `taskmanager.jvm-exit-on-oom: true`, indeed, I overlooked that. The error message can be also improved. Actually, `onFatalError` API could maybe additionally take a parameter saying how to terminate JVM.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services