Hi Madan, The error shows that it cannot create new threads. One common reason is that the physical machine does not configure a large enough thread limit(check this SO <https://stackoverflow.com/questions/59409163/java-lang-outofmemoryerror-unable-to-create-new-native-thread-executing-seleniu> for details) but in your case, it looks more like a thread leak. For your case, as it is regular, maybe you can use shell commands like 'top -H -p <pid>', 'ps -T -p <pid>' to show the thread usage of your TM process or use jstack to see if there is more information about the leak. Typically it can help you locate the over spawned threads. As for the heap dump issue, it looks like a by-design behavior of the JDK. See https://bugs.openjdk.org/browse/JDK-6784422 .
Best, Biao Geng Madan D via user <user@flink.apache.org> 于2024年5月15日周三 09:43写道: > Hello Team, > > Good morning! > We have been running a flink job with Kafka where it gets restarted every > 2 hours with an Out of Memory Exception. We tried to increase task manager > memory and reduce parallelism and having rate limit to reduce consumption > rate, but irrespectively, it restarts every 1-2 hours. > > I see the job application running fine if the payload size is smaller and > failing if it is nearly 700–900 KB. > > I tried to enable heap memory dump to see if there were any leaks, but I > am not able to see any files being generated. > > Can someone help here? > > Flink Command to generate heap dump which is not working > > flink run-application -t yarn-application -Drest.flamegraph.enabled=true > -Denv.java.opts.all="-XX:+HeapDumpOnOutOfMemoryError > -XX:HeapDumpPath=/var/logs/flink/kafka-to-pubsub-test.hprof" - > Dyarn.application.name=idx-pfm-user-financial-data-kafka-to-pubsub-prd > -Dtaskmanager.memory.process.size=4g -Dtaskmanager.numberOfTaskSlots=1 > -Djobmanager.memory.process.size=4g -c KafkaToPubSubJob test.jar > > Caused by: java.lang.OutOfMemoryError: unable to create new native thread > at java.lang.Thread.start0(Native Method) ~[?:1.8.0_412] at java.lang. > Thread.start(Thread.java:719) ~[?:1.8.0_412] at org.apache.http.impl. > client.IdleConnectionEvictor.start(IdleConnectionEvictor.java:96) ~ > [test.jar:?] > > > Regards, > Madan >