[ 
https://issues.apache.org/jira/browse/FLINK-16142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17041683#comment-17041683
 ] 

Thomas Wozniakowski commented on FLINK-16142:
---------------------------------------------

Hi [~sewen], here is the first chunk of the logs with all the config parts:

{code}
Starting Task Manager
config file: 
jobmanager.rpc.address: pattern-detector-e2e-jobmanager
jobmanager.rpc.port: 6123
jobmanager.heap.size: 1024m
taskmanager.memory.process.size: 1568m
taskmanager.numberOfTaskSlots: 2
parallelism.default: 1
jobmanager.execution.failover-strategy: region
blob.server.port: 6124
query.server.port: 6125
Starting taskexecutor as a console application on host 1ef836eff98e.
2020-02-21 08:46:50,418 INFO  
org.apache.flink.runtime.taskexecutor.TaskManagerRunner       - 
--------------------------------------------------------------------------------
2020-02-21 08:46:50,422 INFO  
org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -  
Preconfiguration: 
2020-02-21 08:46:50,423 INFO  
org.apache.flink.runtime.taskexecutor.TaskManagerRunner       - 


TM_RESOURCES_JVM_PARAMS extraction logs:
 - Loading configuration property: jobmanager.rpc.address, 
pattern-detector-e2e-jobmanager
 - Loading configuration property: jobmanager.rpc.port, 6123
 - Loading configuration property: jobmanager.heap.size, 1024m
 - Loading configuration property: taskmanager.memory.process.size, 1568m
 - Loading configuration property: taskmanager.numberOfTaskSlots, 2
 - Loading configuration property: parallelism.default, 1
 - Loading configuration property: jobmanager.execution.failover-strategy, 
region
 - Loading configuration property: blob.server.port, 6124
 - Loading configuration property: query.server.port, 6125
 - The derived from fraction jvm overhead memory (156.800mb (164416719 bytes)) 
is less than its min value 192.000mb (201326592 bytes), min value will be used 
instead
BASH_JAVA_UTILS_EXEC_RESULT:-Xmx536870902 -Xms536870902 
-XX:MaxDirectMemorySize=268435458 -XX:MaxMetaspaceSize=100663296

TM_RESOURCES_DYNAMIC_CONFIGS extraction logs:
 - Loading configuration property: jobmanager.rpc.address, 
pattern-detector-e2e-jobmanager
 - Loading configuration property: jobmanager.rpc.port, 6123
 - Loading configuration property: jobmanager.heap.size, 1024m
 - Loading configuration property: taskmanager.memory.process.size, 1568m
 - Loading configuration property: taskmanager.numberOfTaskSlots, 2
 - Loading configuration property: parallelism.default, 1
 - Loading configuration property: jobmanager.execution.failover-strategy, 
region
 - Loading configuration property: blob.server.port, 6124
 - Loading configuration property: query.server.port, 6125
 - The derived from fraction jvm overhead memory (156.800mb (164416719 bytes)) 
is less than its min value 192.000mb (201326592 bytes), min value will be used 
instead
BASH_JAVA_UTILS_EXEC_RESULT:-D 
taskmanager.memory.framework.off-heap.size=134217728b -D 
taskmanager.memory.network.max=134217730b -D 
taskmanager.memory.network.min=134217730b -D 
taskmanager.memory.framework.heap.size=134217728b -D 
taskmanager.memory.managed.size=536870920b -D taskmanager.cpu.cores=2.0 -D 
taskmanager.memory.task.heap.size=402653174b -D 
taskmanager.memory.task.off-heap.size=0b 

2020-02-21 08:46:50,423 INFO  
org.apache.flink.runtime.taskexecutor.TaskManagerRunner       - 
--------------------------------------------------------------------------------
2020-02-21 08:46:50,424 INFO  
org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -  Starting 
TaskManager (Version: 1.10.0, Rev:aa4eb8f, Date:07.02.2020 @ 19:18:19 CET)
2020-02-21 08:46:50,425 INFO  
org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -  OS current 
user: flink
2020-02-21 08:46:50,426 INFO  
org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -  Current 
Hadoop/Kerberos user: <no hadoop dependency found>
2020-02-21 08:46:50,426 INFO  
org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -  JVM: OpenJDK 
64-Bit Server VM - Oracle Corporation - 1.8/25.242-b08
2020-02-21 08:46:50,426 INFO  
org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -  Maximum heap 
size: 512 MiBytes
2020-02-21 08:46:50,427 INFO  
org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -  JAVA_HOME: 
/usr/local/openjdk-8
2020-02-21 08:46:50,427 INFO  
org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -  No Hadoop 
Dependency available
2020-02-21 08:46:50,428 INFO  
org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -  JVM Options:
2020-02-21 08:46:50,428 INFO  
org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -     -XX:+UseG1GC
2020-02-21 08:46:50,428 INFO  
org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -     
-Xmx536870902
2020-02-21 08:46:50,428 INFO  
org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -     
-Xms536870902
2020-02-21 08:46:50,429 INFO  
org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -     
-XX:MaxDirectMemorySize=268435458
2020-02-21 08:46:50,429 INFO  
org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -     
-XX:MaxMetaspaceSize=100663296
2020-02-21 08:46:50,429 INFO  
org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -     
-Dlog4j.configuration=file:/opt/flink/conf/log4j-console.properties
2020-02-21 08:46:50,429 INFO  
org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -     
-Dlogback.configurationFile=file:/opt/flink/conf/logback-console.xml
2020-02-21 08:46:50,430 INFO  
org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -  Program 
Arguments:
2020-02-21 08:46:50,430 INFO  
org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -     --configDir
2020-02-21 08:46:50,430 INFO  
org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -     
/opt/flink/conf
2020-02-21 08:46:50,430 INFO  
org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -     -D
2020-02-21 08:46:50,431 INFO  
org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -     
taskmanager.memory.framework.off-heap.size=134217728b
2020-02-21 08:46:50,431 INFO  
org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -     -D
2020-02-21 08:46:50,431 INFO  
org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -     
taskmanager.memory.network.max=134217730b
2020-02-21 08:46:50,432 INFO  
org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -     -D
2020-02-21 08:46:50,432 INFO  
org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -     
taskmanager.memory.network.min=134217730b
2020-02-21 08:46:50,432 INFO  
org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -     -D
2020-02-21 08:46:50,432 INFO  
org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -     
taskmanager.memory.framework.heap.size=134217728b
2020-02-21 08:46:50,433 INFO  
org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -     -D
2020-02-21 08:46:50,433 INFO  
org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -     
taskmanager.memory.managed.size=536870920b
2020-02-21 08:46:50,433 INFO  
org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -     -D
2020-02-21 08:46:50,434 INFO  
org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -     
taskmanager.cpu.cores=2.0
2020-02-21 08:46:50,434 INFO  
org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -     -D
2020-02-21 08:46:50,434 INFO  
org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -     
taskmanager.memory.task.heap.size=402653174b
2020-02-21 08:46:50,434 INFO  
org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -     -D
2020-02-21 08:46:50,435 INFO  
org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -     
taskmanager.memory.task.off-heap.size=0b
2020-02-21 08:46:50,435 INFO  
org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -  Classpath: 
/opt/flink/lib/flink-table-blink_2.11-1.10.0.jar:/opt/flink/lib/flink-table_2.11-1.10.0.jar:/opt/flink/lib/log4j-1.2.17.jar:/opt/flink/lib/slf4j-log4j12-1.7.15.jar:/opt/flink/lib/flink-dist_2.11-1.10.0.jar:::
2020-02-21 08:46:50,435 INFO  
org.apache.flink.runtime.taskexecutor.TaskManagerRunner       - 
--------------------------------------------------------------------------------
2020-02-21 08:46:50,438 INFO  
org.apache.flink.runtime.taskexecutor.TaskManagerRunner       - Registered UNIX 
signal handlers for [TERM, HUP, INT]
2020-02-21 08:46:50,448 INFO  
org.apache.flink.runtime.taskexecutor.TaskManagerRunner       - Maximum number 
of open file descriptors is 1048576.
2020-02-21 08:46:50,487 INFO  
org.apache.flink.configuration.GlobalConfiguration            - Loading 
configuration property: jobmanager.rpc.address, pattern-detector-e2e-jobmanager
2020-02-21 08:46:50,487 INFO  
org.apache.flink.configuration.GlobalConfiguration            - Loading 
configuration property: jobmanager.rpc.port, 6123
2020-02-21 08:46:50,487 INFO  
org.apache.flink.configuration.GlobalConfiguration            - Loading 
configuration property: jobmanager.heap.size, 1024m
2020-02-21 08:46:50,488 INFO  
org.apache.flink.configuration.GlobalConfiguration            - Loading 
configuration property: taskmanager.memory.process.size, 1568m
2020-02-21 08:46:50,489 INFO  
org.apache.flink.configuration.GlobalConfiguration            - Loading 
configuration property: taskmanager.numberOfTaskSlots, 2
2020-02-21 08:46:50,489 INFO  
org.apache.flink.configuration.GlobalConfiguration            - Loading 
configuration property: parallelism.default, 1
2020-02-21 08:46:50,490 INFO  
org.apache.flink.configuration.GlobalConfiguration            - Loading 
configuration property: jobmanager.execution.failover-strategy, region
2020-02-21 08:46:50,492 INFO  
org.apache.flink.configuration.GlobalConfiguration            - Loading 
configuration property: blob.server.port, 6124
2020-02-21 08:46:50,493 INFO  
org.apache.flink.configuration.GlobalConfiguration            - Loading 
configuration property: query.server.port, 6125
{code}

> Memory Leak causes Metaspace OOM error on repeated job submission
> -----------------------------------------------------------------
>
>                 Key: FLINK-16142
>                 URL: https://issues.apache.org/jira/browse/FLINK-16142
>             Project: Flink
>          Issue Type: Bug
>          Components: Client / Job Submission
>    Affects Versions: 1.10.0
>            Reporter: Thomas Wozniakowski
>            Priority: Blocker
>             Fix For: 1.10.1, 1.11.0
>
>
> Hi Guys,
> We've just tried deploying 1.10.0 as it has lots of shiny stuff that fits our 
> use-case exactly (RocksDB state backend running in a containerised cluster). 
> Unfortunately, it seems like there is a memory leak somewhere in the job 
> submission logic. We are getting this error:
> {code:java}
> 2020-02-18 10:22:10,020 INFO 
> org.apache.flink.runtime.executiongraph.ExecutionGraph - OPERATOR_NAME 
> switched from RUNNING to FAILED.
> java.lang.OutOfMemoryError: Metaspace
> at java.lang.ClassLoader.defineClass1(Native Method)
> at java.lang.ClassLoader.defineClass(ClassLoader.java:757)
> at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
> at java.net.URLClassLoader.defineClass(URLClassLoader.java:468)
> at java.net.URLClassLoader.access$100(URLClassLoader.java:74)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:369)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:363)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:362)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:419)
> at 
> org.apache.flink.util.ChildFirstClassLoader.loadClass(ChildFirstClassLoader.java:60)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:352)
> at 
> org.apache.flink.kinesis.shaded.com.amazonaws.jmx.SdkMBeanRegistrySupport.registerMetricAdminMBean(SdkMBeanRegistrySupport.java:27)
> at 
> org.apache.flink.kinesis.shaded.com.amazonaws.metrics.AwsSdkMetrics.registerMetricAdminMBean(AwsSdkMetrics.java:398)
> at 
> org.apache.flink.kinesis.shaded.com.amazonaws.metrics.AwsSdkMetrics.<clinit>(AwsSdkMetrics.java:359)
> at 
> org.apache.flink.kinesis.shaded.com.amazonaws.AmazonWebServiceClient.requestMetricCollector(AmazonWebServiceClient.java:728)
> at 
> org.apache.flink.kinesis.shaded.com.amazonaws.AmazonWebServiceClient.isRMCEnabledAtClientOrSdkLevel(AmazonWebServiceClient.java:660)
> at 
> org.apache.flink.kinesis.shaded.com.amazonaws.AmazonWebServiceClient.isRequestMetricsEnabled(AmazonWebServiceClient.java:652)
> at 
> org.apache.flink.kinesis.shaded.com.amazonaws.AmazonWebServiceClient.createExecutionContext(AmazonWebServiceClient.java:611)
> at 
> org.apache.flink.kinesis.shaded.com.amazonaws.AmazonWebServiceClient.createExecutionContext(AmazonWebServiceClient.java:606)
> at 
> org.apache.flink.kinesis.shaded.com.amazonaws.services.kinesis.AmazonKinesisClient.executeListShards(AmazonKinesisClient.java:1534)
> at 
> org.apache.flink.kinesis.shaded.com.amazonaws.services.kinesis.AmazonKinesisClient.listShards(AmazonKinesisClient.java:1528)
> at 
> org.apache.flink.streaming.connectors.kinesis.proxy.KinesisProxy.listShards(KinesisProxy.java:439)
> at 
> org.apache.flink.streaming.connectors.kinesis.proxy.KinesisProxy.getShardsOfStream(KinesisProxy.java:389)
> at 
> org.apache.flink.streaming.connectors.kinesis.proxy.KinesisProxy.getShardList(KinesisProxy.java:279)
> at 
> org.apache.flink.streaming.connectors.kinesis.internals.KinesisDataFetcher.discoverNewShardsToSubscribe(KinesisDataFetcher.java:686)
> at 
> org.apache.flink.streaming.connectors.kinesis.FlinkKinesisConsumer.run(FlinkKinesisConsumer.java:287)
> at 
> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:100)
> at 
> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:63)
> {code}
> (The only change in the above text is the OPERATOR_NAME text where I removed 
> some of the internal specifics of our system).
> This will reliably happen on a fresh cluster after submitting and cancelling 
> our job 3 times.
> We are using the presto-s3 plugin, the CEP library and the Kinesis connector.
> Please let me know what other diagnostics would be useful.
> Tom



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to