StephanEwen commented on pull request #11554:
URL: https://github.com/apache/flink/pull/11554#issuecomment-621785104


   Thank you for addressing the changes. Looks good to me with two remarks:
   
   (1) Can we wrap the `ByteArrayOutputStream` into a `DataOutputStream` 
instead of using the `SerdeUtils`? It looks both simpler and more efficient and 
is also what other parts of the Flink code typically do.
   
   (2) About the "Fatal Error Handler": Virtually all of the really critical 
issues in Flink have fallen into the following category: An exception happens 
that we try to handle "fine grained" but it actually leaves something in an 
inconsistent state that is not recognized. Had we simply handled it "coarse 
grained" (fail process, rely on recovery) it would have been fine.
   
   That is why I think the default should be to escalate. If we feel that we 
are completely sure we can handle this more fine-grained, then it is fine.
   This issue is somewhat connected with the "closing" of the coordinator.
   
   How about the following approach:
     - We add a "process killing" uncaught exception handler, as the safety net.
     - We catch the exceptions in the runnables ourselves and cause a global 
job failure upon exception, unless the `SourceCoordinator` is closed already
     - The global job failure will close() the current coordinator re-create 
the coordinator from latest checkpoint (we need to still implement this)
     - closing the source coordinator gets the following changes
       - we call `shutdownNow()` on the executor to make sure all queued tasks 
do not get executed
       - we make sure the `SourceCoordinator` does not forward events from the 
Enumerator any more, so that a long-running task that is still being executed 
in the enumerator cannot affect anything any more
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to