Hi Timo, Yes, we are using off-heap memory, our yarn container are set to use ~23G memory with two slot per container and set yarn heap cutoff ratio to 0.6.
Jobs are having normal memory usage, problem here is not temporary halt but permanent halt for the running jobs. Task manager's log 2018-02-08 16:55:31,007 INFO org.apache.flink.yarn.YarnTaskManagerRunner - JVM Options: 2018-02-08 16:55:31,007 INFO org.apache.flink.yarn.YarnTaskManagerRunner - -Xms9370m 2018-02-08 16:55:31,007 INFO org.apache.flink.yarn.YarnTaskManagerRunner - -Xmx9370m GC run and memory usage on one of used task manager Garbage Collection CollectorCountTime PS_Scavenge 22,673 702,544 PS_MarkSweep 143 77,431 MemoryJVM (Heap/Non-Heap) TypeCommittedUsedMaximum Heap 9.11 GB 6.23 GB 9.11 GB Non-Heap 1.73 GB 1.67 GB -1 B Total 10.8 GB 7.90 GB 9.11 GB -- Thanks, Amit On Mon, Feb 12, 2018 at 9:50 PM, Timo Walther <twal...@apache.org> wrote: > Hi Amit, > > how is the memory consumption when the jobs get stuck? Is the Java GC > active? Are you using off-heap memory? > > Regards, > Timo > > Am 2/12/18 um 10:10 AM schrieb Amit Jain: > > Hi, >> >> We have created Batch job where we are trying to merge set of S3 >> directories in TextFormat with the old snapshot in Parquet format. >> >> We are running 50 such jobs daily and found the progress of few random >> jobs >> get stuck in between. We have gone through logs of JobManager, TaskManager >> and could not get any useful information there. >> >> Important operators involved, are read using TextInputFormat, read using >> HadoopInputFormat, FullOuterJoin, write using our BucketingSink code. >> >> Please help resolve this issue. >> >> Flink Version 1.3.2 deployed on Yarn Container >> >> -- >> Thanks, >> Amit >> >> >