One would need to look at your code and possible on some heap statistics. Maybe 
something wrong happens when you cache them (do you use a 3rd party library or 
your own implementation?). Do you use a stable version of your protobuf library 
(not necessarily the most recent). You also may want to look at buffers to 
avoid creating objects (bytebuffer, stringbuffer etc).

Probably you are creating a lot of objects due to conversion into PoJo. You 
could increase the heap for the Java objects of the young generation.
You can also switch to the G1-Garbage collector (if Jdk 8) or at least the 
parallel one.
Generally you should avoid creating PoJo/objects as much as possible in a long 
running Streaming job.



> On 21. Aug 2017, at 05:29, Govindarajan Srinivasaraghavan 
> <govindragh...@gmail.com> wrote:
> 
> Hi,
> 
> I have a pipeline running on flink which ingests around 6k messages per 
> second. Each message is around 1kb and it passes through various stages like 
> filter, 5 sec tumbling window per key etc.. and finally flatmap to 
> computation before sending it to kafka sink. The data is first ingested as 
> protocol buffers and then in subsequent operators they are converted into 
> POJO's.
> 
> There are lots objects created inside the user functions and some of them are 
> cached as well. I have been running this pipeline on 48 task slots across 3 
> task manages with each one allocated with 22GB memory.
> 
> The issue I'm having is within a period of 10 hours, almost 19k young 
> generation GC have been run which is roughly every 2 seconds and GC time 
> taken value is more than 2 million. I have also enabled object reuse. Any 
> suggestions on how this issue could be resolved? Thanks.
> 
> Regards,
> Govind
> 

Reply via email to