[ https://issues.apache.org/jira/browse/FLINK-16408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17180363#comment-17180363 ]
Till Rohrmann commented on FLINK-16408: --------------------------------------- How many concurrent {{WordCount}} jobs do you have running when the cluster fails? How long does a {{WordCount}} job take to execute? Maybe you could share the cluster logs with us to see what is going on. > Bind user code class loader to lifetime of a slot > ------------------------------------------------- > > Key: FLINK-16408 > URL: https://issues.apache.org/jira/browse/FLINK-16408 > Project: Flink > Issue Type: Improvement > Components: Runtime / Coordination > Affects Versions: 1.9.2, 1.10.0 > Reporter: Till Rohrmann > Assignee: Till Rohrmann > Priority: Critical > Labels: pull-request-available > Fix For: 1.11.0 > > Attachments: Metaspace-OOM.png > > > In order to avoid class leaks due to creating multiple user code class > loaders and loading class multiple times in a recovery case, I would suggest > to bind the lifetime of a user code class loader to the lifetime of a slot. > More precisely, the user code class loader should live at most as long as the > slot which is using it. -- This message was sent by Atlassian Jira (v8.3.4#803005)