StephanEwen commented on pull request #11554: URL: https://github.com/apache/flink/pull/11554#issuecomment-621785104
Thank you for addressing the changes. Looks good to me with two remarks: (1) Can we wrap the `ByteArrayOutputStream` into a `DataOutputStream` instead of using the `SerdeUtils`? It looks both simpler and more efficient and is also what other parts of the Flink code typically do. (2) About the "Fatal Error Handler": Virtually all of the really critical issues in Flink have fallen into the following category: An exception happens that we try to handle "fine grained" but it actually leaves something in an inconsistent state that is not recognized. Had we simply handled it "coarse grained" (fail process, rely on recovery) it would have been fine. That is why I think the default should be to escalate. If we feel that we are completely sure we can handle this more fine-grained, then it is fine. This issue is somewhat connected with the "closing" of the coordinator. How about the following approach: - We add a "process killing" uncaught exception handler, as the safety net. - We catch the exceptions in the runnables ourselves and cause a global job failure upon exception, unless the `SourceCoordinator` is closed already - The global job failure will close() the current coordinator re-create the coordinator from latest checkpoint (we need to still implement this) - closing the source coordinator gets the following changes - we call `shutdownNow()` on the executor to make sure all queued tasks do not get executed - we make sure the `SourceCoordinator` does not forward events from the Enumerator any more, so that a long-running task that is still being executed in the enumerator cannot affect anything any more ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org