azagrebin commented on issue #11408: [FLINK-15989][FLINK-16225] Improve direct 
and metaspace out-of-memory error handling
URL: https://github.com/apache/flink/pull/11408#issuecomment-600217460
 
 
   Thanks for the review @tillrohrmann 
   
   As I understand, OOM can generally happen everywhere, especially Metaspace 
OOM on any class loading, not only in user code. Therefore, my idea was to try 
to catch it in common places, as you also mentioned, in the most outer 
exception handling of known threads or propagated to them: 
   - task thread
   - RPC endpoint thread
   - network threads
   
   I thought that handling in task thread is the most probable place to catch 
the error but you are right I can look into other places as well.
   
   If I do not miss something, `TaskExecutor#submitTask` does not look special 
comparing to other RPC handlers running in RPC main thread. RPC endpoint could 
have some central error handling place for this case or I overlooked it. I 
would assume that it does not make sense for RPC endpoint to survive OOM error 
but at least to report it to JVM if possible.
   
   I can also write tests emulating throwing errors in those known threads and 
checking that `onFatalError` is properly called.
   
   By adding error message to the OOM exception, I suppose you mean either 
generating another one with now detailed message or wrapping it in another more 
descriptive OOM exception. This would have to be also done in the same places 
as logging atm. I somewhat did not want to touch the original error generated 
by JVM but I think you are right about reporting it to JM. I can change the 
error message of the error.
   
   For the case of `taskmanager.jvm-exit-on-oom: true`, indeed, I overlooked 
that. The error message can be also improved. Actually, `onFatalError` API 
could maybe additionally take a parameter saying how to terminate JVM.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to