Re: Batch job getting stuck

Amit Jain Wed, 14 Feb 2018 00:22:51 -0800

Hi Timo,

Yes, we are using off-heap memory, our yarn container are set to use ~23G
memory with two slot per container and set yarn heap cutoff ratio to 0.6.


Jobs are having normal memory usage, problem here is not temporary halt but
permanent halt for the running jobs.

Task manager's log

2018-02-08 16:55:31,007 INFO
org.apache.flink.yarn.YarnTaskManagerRunner                   -  JVM
Options:
2018-02-08 16:55:31,007 INFO
org.apache.flink.yarn.YarnTaskManagerRunner                   -
-Xms9370m
2018-02-08 16:55:31,007 INFO
org.apache.flink.yarn.YarnTaskManagerRunner                   -
-Xmx9370m


GC run and memory usage on one of used task manager

Garbage Collection
CollectorCountTime
PS_Scavenge 22,673 702,544
PS_MarkSweep 143 77,431
MemoryJVM (Heap/Non-Heap)
TypeCommittedUsedMaximum
Heap 9.11 GB 6.23 GB 9.11 GB
Non-Heap 1.73 GB 1.67 GB -1 B
Total 10.8 GB 7.90 GB 9.11 GB


--
Thanks,
Amit


On Mon, Feb 12, 2018 at 9:50 PM, Timo Walther <twal...@apache.org> wrote:

> Hi Amit,
>
> how is the memory consumption when the jobs get stuck? Is the Java GC
> active? Are you using off-heap memory?
>
> Regards,
> Timo
>
> Am 2/12/18 um 10:10 AM schrieb Amit Jain:
>
> Hi,
>>
>> We have created Batch job where we are trying to merge set of S3
>> directories in TextFormat with the old snapshot in Parquet format.
>>
>> We are running 50 such jobs daily and found the progress of few random
>> jobs
>> get stuck in between. We have gone through logs of JobManager, TaskManager
>> and could not get any useful information there.
>>
>> Important operators involved, are read using TextInputFormat, read using
>> HadoopInputFormat, FullOuterJoin, write using our BucketingSink code.
>>
>> Please help resolve this issue.
>>
>> Flink Version 1.3.2 deployed on Yarn Container
>>
>> --
>> Thanks,
>> Amit
>>
>>
>

Re: Batch job getting stuck

Reply via email to