Borland Won created FLINK-23357: ----------------------------------- Summary: jobmanager metaspace oom Key: FLINK-23357 URL: https://issues.apache.org/jira/browse/FLINK-23357 Project: Flink Issue Type: Bug Affects Versions: 1.12.2 Reporter: Borland Won Attachments: image-2021-07-12-16-57-55-256.png, image-2021-07-12-17-06-17-218.png
*Flink Version: 1.12.2* Hi . I created a standalone HA cluster(with 3 taskmanager and 2 jobmanager), and repeatedly submit new jobs to the cluster and cancel old jobs via rest api . Then jobmanager master got the increasing metaspace. !image-2021-07-12-16-57-55-256.png! Soon it will OOM and get the exception below: 2021-06-21 15:44:06,637 ERROR org.apache.flink.runtime.webmonitor.handlers.JarRunHandler [] - Unhandled exception.2021-06-21 15:44:06,637 ERROR org.apache.flink.runtime.webmonitor.handlers.JarRunHandler [] - Unhandled exception.java.util.concurrent.CompletionException: org.apache.flink.client.program.ProgramInvocationException: The program's entry point class 'xxx.xxx.xxx.XXXBootstrap' caused an exception during initialization: Metaspace. The metaspace out-of-memory error has occurred. This can mean two things: either Flink Master requires a larger size of JVM metaspace to load classes or there is a class loading leak. In the first case 'jobmanager.memory.jvm-metaspace.size' configuration option should be increased. If the error persists (usually in cluster after several job (re-)submissions) then there is probably a class loading leak in user code or some of its dependencies which has to be investigated and fixed. The Flink Master has to be shutdown... at org.apache.flink.runtime.webmonitor.handlers.utils.JarHandlerUtils$JarHandlerContext.toPackagedProgram(JarHandlerUtils.java:184) ~[flink-dist_2.11-1.12.2.jar:1.12.2] at org.apache.flink.runtime.webmonitor.handlers.utils.JarHandlerUtils$JarHandlerContext.applyToConfiguration(JarHandlerUtils.java:141) ~[flink-dist_2.11-1.12.2.jar:1.12.2] at org.apache.flink.runtime.webmonitor.handlers.JarRunHandler.handleRequest(JarRunHandler.java:95) ~[flink-dist_2.11-1.12.2.jar:1.12.2] at org.apache.flink.runtime.webmonitor.handlers.JarRunHandler.handleRequest(JarRunHandler.java:53) ~[flink-dist_2.11-1.12.2.jar:1.12.2] at org.apache.flink.runtime.rest.handler.AbstractRestHandler.respondToRequest(AbstractRestHandler.java:83) ~[flink-dist_2.11-1.12.2.jar:1.12.2] at org.apache.flink.runtime.rest.handler.AbstractHandler.respondAsLeader(AbstractHandler.java:195) ~[flink-dist_2.11-1.12.2.jar:1.12.2] at org.apache.flink.runtime.rest.handler.LeaderRetrievalHandler.lambda$channelRead0$0(LeaderRetrievalHandler.java:83) ~[flink-dist_2.11-1.12.2.jar:1.12.2] at java.util.Optional.ifPresent(Optional.java:159) [?:1.8.0_292] at org.apache.flink.util.OptionalConsumer.ifPresent(OptionalConsumer.java:45) [flink-dist_2.11-1.12.2.jar:1.12.2] at org.apache.flink.runtime.rest.handler.LeaderRetrievalHandler.channelRead0(LeaderRetrievalHandler.java:80) [flink-dist_2.11-1.12.2.jar:1.12.2] at org.apache.flink.runtime.rest.handler.LeaderRetrievalHandler.channelRead0(LeaderRetrievalHandler.java:49) [flink-dist_2.11-1.12.2.jar:1.12.2] at org.apache.flink.shaded.netty4.io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99) [flink-dist_2.11-1.12.2.jar:1.12.2] at org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [flink-dist_2.11-1.12.2.jar:1.12.2] at org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [flink-dist_2.11-1.12.2.jar:1.12.2] at org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) [flink-dist_2.11-1.12.2.jar:1.12.2] at org.apache.flink.runtime.rest.handler.router.RouterHandler.routed(RouterHandler.java:115) [flink-dist_2.11-1.12.2.jar:1.12.2] at org.apache.flink.runtime.rest.handler.router.RouterHandler.channelRead0(RouterHandler.java:94) [flink-dist_2.11-1.12.2.jar:1.12.2] at org.apache.flink.runtime.rest.handler.router.RouterHandler.channelRead0(RouterHandler.java:55) [flink-dist_2.11-1.12.2.jar:1.12.2] at org.apache.flink.shaded.netty4.io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99) [flink-dist_2.11-1.12.2.jar:1.12.2] at org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [flink-dist_2.11-1.12.2.jar:1.12.2] at org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [flink-dist_2.11-1.12.2.jar:1.12.2] at org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) [flink-dist_2.11-1.12.2.jar:1.12.2] at org.apache.flink.shaded.netty4.io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) [flink-dist_2.11-1.12.2.jar:1.12.2] at org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [flink-dist_2.11-1.12.2.jar:1.12.2] at org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [flink-dist_2.11-1.12.2.jar:1.12.2] at org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) [flink-dist_2.11-1.12.2.jar:1.12.2] at org.apache.flink.runtime.rest.FileUploadHandler.channelRead0(FileUploadHandler.java:208) [flink-dist_2.11-1.12.2.jar:1.12.2] at org.apache.flink.runtime.rest.FileUploadHandler.channelRead0(FileUploadHandler.java:69) [flink-dist_2.11-1.12.2.jar:1.12.2] at org.apache.flink.shaded.netty4.io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99) [flink-dist_2.11-1.12.2.jar:1.12.2] at org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [flink-dist_2.11-1.12.2.jar:1.12.2] at org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [flink-dist_2.11-1.12.2.jar:1.12.2] at org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) [flink-dist_2.11-1.12.2.jar:1.12.2] at org.apache.flink.shaded.netty4.io.netty.channel.CombinedChannelDuplexHandler$DelegatingChannelHandlerContext.fireChannelRead(CombinedChannelDuplexHandler.java:436) [flink-dist_2.11-1.12.2.jar:1.12.2] at org.apache.flink.shaded.netty4.io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:324) [flink-dist_2.11-1.12.2.jar:1.12.2] at org.apache.flink.shaded.netty4.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:296) [flink-dist_2.11-1.12.2.jar:1.12.2] at org.apache.flink.shaded.netty4.io.netty.channel.CombinedChannelDuplexHandler.channelRead(CombinedChannelDuplexHandler.java:251) [flink-dist_2.11-1.12.2.jar:1.12.2] at org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [flink-dist_2.11-1.12.2.jar:1.12.2] at org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [flink-dist_2.11-1.12.2.jar:1.12.2] at org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) [flink-dist_2.11-1.12.2.jar:1.12.2] at org.apache.flink.shaded.netty4.io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) [flink-dist_2.11-1.12.2.jar:1.12.2] at org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [flink-dist_2.11-1.12.2.jar:1.12.2] at org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [flink-dist_2.11-1.12.2.jar:1.12.2] at org.apache.flink.shaded.netty4.io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) [flink-dist_2.11-1.12.2.jar:1.12.2] at org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163) [flink-dist_2.11-1.12.2.jar:1.12.2] at org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:714) [flink-dist_2.11-1.12.2.jar:1.12.2] at org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:650) [flink-dist_2.11-1.12.2.jar:1.12.2] at org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:576) [flink-dist_2.11-1.12.2.jar:1.12.2] at org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493) [flink-dist_2.11-1.12.2.jar:1.12.2] at org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) [flink-dist_2.11-1.12.2.jar:1.12.2] at org.apache.flink.shaded.netty4.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [flink-dist_2.11-1.12.2.jar:1.12.2] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_292]Caused by: org.apache.flink.client.program.ProgramInvocationException: The program's entry point class 'xxx.xxx.xxx.XXXBootstrap' caused an exception during initialization: Metaspace. The metaspace out-of-memory error has occurred. This can mean two things: either Flink Master requires a larger size of JVM metaspace to load classes or there is a class loading leak. In the first case 'jobmanager.memory.jvm-metaspace.size' configuration option should be increased. If the error persists (usually in cluster after several job (re-)submissions) then there is probably a class loading leak in user code or some of its dependencies which has to be investigated and fixed. The Flink Master has to be shutdown... at org.apache.flink.client.program.PackagedProgram.loadMainClass(PackagedProgram.java:497) ~[flink-dist_2.11-1.12.2.jar:1.12.2] at org.apache.flink.client.program.PackagedProgram.<init>(PackagedProgram.java:152) ~[flink-dist_2.11-1.12.2.jar:1.12.2] at org.apache.flink.client.program.PackagedProgram.<init>(PackagedProgram.java:64) ~[flink-dist_2.11-1.12.2.jar:1.12.2] at org.apache.flink.client.program.PackagedProgram$Builder.build(PackagedProgram.java:685) ~[flink-dist_2.11-1.12.2.jar:1.12.2] at org.apache.flink.runtime.webmonitor.handlers.utils.JarHandlerUtils$JarHandlerContext.toPackagedProgram(JarHandlerUtils.java:182) ~[flink-dist_2.11-1.12.2.jar:1.12.2] ... 50 moreCaused by: java.lang.OutOfMemoryError: Metaspace. The metaspace out-of-memory error has occurred. This can mean two things: either Flink Master requires a larger size of JVM metaspace to load classes or there is a class loading leak. In the first case 'jobmanager.memory.jvm-metaspace.size' configuration option should be increased. If the error persists (usually in cluster after several job (re-)submissions) then there is probably a class loading leak in user code or some of its dependencies which has to be investigated and fixed. The Flink Master has to be shutdown... I know " there is probably a class loading leak in user code or some of its dependencies which has to be investigated and fixed" , but don't know how to recognize it. I jmap the jobmanager process which is coming to OOM . Bootstrap is my entrance class with main method. Why it and the other function class were loaded 103 times ? !image-2021-07-12-17-06-17-218.png! -- This message was sent by Atlassian Jira (v8.3.4#803005)