Thanks Stephan, any pointers on how managed memory is used in streaming application will really help.
Regards, Govind > On Aug 24, 2017, at 1:53 AM, Stephan Ewen <se...@apache.org> wrote: > > Hi! > > RocksDB will be used when it is selected as the state backend, independent of > the checkpointing configuration. > > Using RocksDB as the state backend, Flink will have some objects on the heap, > like timers (we will move them to RocksDB as well in the near future) but the > majority will be off heap. > > Stephan > > >> On Thu, Aug 24, 2017 at 5:28 AM, Govindarajan Srinivasaraghavan >> <govindragh...@gmail.com> wrote: >> I have couple more questions regarding flink's jvm memory. >> >> In a streaming application what is managed memory used for? I read from a >> blog that all objects created inside the user function will go into >> unmanaged memory. Where does the managed key/ operator state state reside? >> >> Also when does the state gets persisted into rocksdb, is it only when >> checkpointing is enabled? If the state backend is rocksdb but the >> checkpointing is not enabled what will happen? >> >> Thanks. >> >>> On Sun, Aug 20, 2017 at 11:14 PM, Jörn Franke <jornfra...@gmail.com> wrote: >>> One would need to look at your code and possible on some heap statistics. >>> Maybe something wrong happens when you cache them (do you use a 3rd party >>> library or your own implementation?). Do you use a stable version of your >>> protobuf library (not necessarily the most recent). You also may want to >>> look at buffers to avoid creating objects (bytebuffer, stringbuffer etc). >>> >>> Probably you are creating a lot of objects due to conversion into PoJo. You >>> could increase the heap for the Java objects of the young generation. >>> You can also switch to the G1-Garbage collector (if Jdk 8) or at least the >>> parallel one. >>> Generally you should avoid creating PoJo/objects as much as possible in a >>> long running Streaming job. >>> >>> >>> >>> > On 21. Aug 2017, at 05:29, Govindarajan Srinivasaraghavan >>> > <govindragh...@gmail.com> wrote: >>> > >>> > Hi, >>> > >>> > I have a pipeline running on flink which ingests around 6k messages per >>> > second. Each message is around 1kb and it passes through various stages >>> > like filter, 5 sec tumbling window per key etc.. and finally flatmap to >>> > computation before sending it to kafka sink. The data is first ingested >>> > as protocol buffers and then in subsequent operators they are converted >>> > into POJO's. >>> > >>> > There are lots objects created inside the user functions and some of them >>> > are cached as well. I have been running this pipeline on 48 task slots >>> > across 3 task manages with each one allocated with 22GB memory. >>> > >>> > The issue I'm having is within a period of 10 hours, almost 19k young >>> > generation GC have been run which is roughly every 2 seconds and GC time >>> > taken value is more than 2 million. I have also enabled object reuse. Any >>> > suggestions on how this issue could be resolved? Thanks. >>> > >>> > Regards, >>> > Govind >>> > >> >