[ https://issues.apache.org/jira/browse/FLINK-15156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17211949#comment-17211949 ]
Hwanju Kim commented on FLINK-15156: ------------------------------------ FWIW, the following is what we have done: * Flink user security manager is added for general user sandbox checking, where currently only the exit is checked (others can be added later here). * The added one is forwarding all the checks but its overridden ones to previous security manager, if any (like decorator). * The security manager is set when JM and TM start (if configured, as described in the last bullet point). * Exit check has enabling/disabling point via a method only to affect user code, as Flink runtime needs to exit for some cases (e.g., fatal error). ** Once enabled, any thread spawned from the main thread inherits the enable flag. * What's enclosed by this enabled exit check is currently best-effort, not covering all the places where user code is involved. Main places are: ** main() in JM (currently for invokeInteractiveModeForExecution) ** StreamTask.invoke, triggerCheckpoint, cancel. * New exception, UserSystemExitException, is defined to be thrown when user code attempts to exit JVM. This has default message to warn the user. ** In main(), it's wrapped into ProgramInvocationException. ** In UDF, it fails the exiting task, thereby shipping the exception to JM triggering fail-over. * This security manager is only added if configuration (under security section) in flink-conf.yaml is enabled (disabled by default). The configuration is per check case (but currently only disallow-system-exit is available). Please let me know if anyone wants to review the patch, or just discussion if anything does not make sense. > Warn user if System.exit() is called in user code > ------------------------------------------------- > > Key: FLINK-15156 > URL: https://issues.apache.org/jira/browse/FLINK-15156 > Project: Flink > Issue Type: Improvement > Components: Runtime / Coordination > Reporter: Robert Metzger > Priority: Minor > Labels: starter > > It would make debugging Flink errors easier if we would intercept and log > calls to System.exit() through the SecurityManager. > A user recently had an error where the JobManager was shutting down because > of a System.exit() in the user code: > https://lists.apache.org/thread.html/b28dabcf3068d489f38399c456c80d48569fcdf74b15f8bb95d532d0%40%3Cuser.flink.apache.org%3E > If I remember correctly, we had such issues before. > I put this ticket into the "Runtime / Coordination" component, as it is > mostly about improving the usability / debuggability in that area. -- This message was sent by Atlassian Jira (v8.3.4#803005)