The Java options should not have the double quotes. That was the issue. I was able to generate the heap dump. based on the dump have made some changes in the code to fix this issue.
This worked - env.java.opts: -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp/dump.hprof Thanks. On Mon, 8 Mar 2021 at 7:48 AM, Xintong Song <tonysong...@gmail.com> wrote: > Hi Hemant, > I don't see any problem in your settings. Any exceptions suggesting why TM > containers are not coming up? > > Thank you~ > > Xintong Song > > > > On Sat, Mar 6, 2021 at 3:53 PM bat man <tintin0...@gmail.com> wrote: > >> Hi Xintong Song, >> I tried using the java options to generate heap dump referring to docs[1] >> in flink-conf.yaml, however after adding this the task manager containers >> are not coming up. Note that I am using EMR. Am i doing anything wrong here? >> >> env.java.opts: "-XX:+HeapDumpOnOutOfMemoryError >> -XX:HeapDumpPath=/tmp/dump.hprof" >> >> Thanks, >> Hemant >> >> >> >> >> >> On Fri, Mar 5, 2021 at 3:05 PM Xintong Song <tonysong...@gmail.com> >> wrote: >> >>> Hi Hemant, >>> >>> This exception generally suggests that JVM is running out of heap >>> memory. Per the official documentation [1], the amount of live data barely >>> fits into the Java heap having little free space for new allocations. >>> >>> You can try to increase the heap size following these guides [2]. >>> >>> If a memory leak is suspected, to further understand where the memory is >>> consumed, you may need to dump the heap on OOMs and looking for unexpected >>> memory usages leveraging profiling tools. >>> >>> Thank you~ >>> >>> Xintong Song >>> >>> >>> [1] >>> https://docs.oracle.com/javase/8/docs/technotes/guides/troubleshoot/memleaks002.html >>> >>> [2] >>> https://ci.apache.org/projects/flink/flink-docs-release-1.12/deployment/memory/mem_setup.html >>> >>> >>> >>> On Fri, Mar 5, 2021 at 4:24 PM bat man <tintin0...@gmail.com> wrote: >>> >>>> Hi, >>>> >>>> Getting the below OOM but the job failed 4-5 times and recovered from >>>> there. >>>> >>>> j >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> *ava.lang.Exception: java.lang.OutOfMemoryError: GC overhead limit >>>> exceeded at >>>> org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.checkThrowSourceExecutionException(SourceStreamTask.java:212) >>>> at >>>> org.apache.flink.streaming.runtime.tasks.SourceStreamTask.performDefaultAction(SourceStreamTask.java:132) >>>> at >>>> org.apache.flink.streaming.runtime.tasks.StreamTask.run(StreamTask.java:298) >>>> at >>>> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:403) >>>> at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705) >>>> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530) >>>> at java.lang.Thread.run(Thread.java:748)Caused by: >>>> java.lang.OutOfMemoryError: GC overhead limit exceeded* >>>> >>>> Is there any way I can debug this. since the job after a few re-starts >>>> started running fine. what could be the reason behind this. >>>> >>>> Thanks, >>>> Hemant >>>> >>>