I agree with Ufuk, it would be helpful to know what stateful operations are in the jobs (including windowing).
> On 7. Nov 2017, at 14:53, Ufuk Celebi <u...@apache.org> wrote: > > Do you use any windowing? If yes, could you please share that code? If > there is no stateful operation at all, it's strange where the list > state instances are coming from. > > On Tue, Nov 7, 2017 at 2:35 PM, ebru <b20926...@cs.hacettepe.edu.tr> wrote: >> Hi Ufuk, >> >> We don’t explicitly define any state descriptor. We only use map and filters >> operator. We thought that gc handle clearing the flink’s internal states. >> So how can we manage the memory if it is always increasing? >> >> - Ebru >> >> On 7 Nov 2017, at 16:23, Ufuk Celebi <u...@apache.org> wrote: >> >> Hey Ebru, the memory usage might be increasing as long as a job is running. >> This is expected (also in the case of multiple running jobs). The >> screenshots are not helpful in that regard. :-( >> >> What kind of stateful operations are you using? Depending on your use case, >> you have to manually call `clear()` on the state instance in order to >> release the managed state. >> >> Best, >> >> Ufuk >> >> On Tue, Nov 7, 2017 at 12:43 PM, ebru <b20926...@cs.hacettepe.edu.tr> wrote: >>> >>> >>> >>> Begin forwarded message: >>> >>> From: ebru <b20926...@cs.hacettepe.edu.tr> >>> Subject: Re: Flink memory leak >>> Date: 7 November 2017 at 14:09:17 GMT+3 >>> To: Ufuk Celebi <u...@apache.org> >>> >>> Hi Ufuk, >>> >>> There are there snapshots of htop output. >>> 1. snapshot is initial state. >>> 2. snapshot is after submitted one job. >>> 3. Snapshot is the output of the one job with 15000 EPS. And the memory >>> usage is always increasing over time. >>> >>> >>> >>> >>> <1.png><2.png><3.png> >>> >>> On 7 Nov 2017, at 13:34, Ufuk Celebi <u...@apache.org> wrote: >>> >>> Hey Ebru, >>> >>> let me pull in Aljoscha (CC'd) who might have an idea what's causing this. >>> >>> Since multiple jobs are running, it will be hard to understand to >>> which job the state descriptors from the heap snapshot belong to. >>> - Is it possible to isolate the problem and reproduce the behaviour >>> with only a single job? >>> >>> – Ufuk >>> >>> >>> On Tue, Nov 7, 2017 at 10:27 AM, ÇETİNKAYA EBRU ÇETİNKAYA EBRU >>> <b20926...@cs.hacettepe.edu.tr> wrote: >>> >>> Hi, >>> >>> We are using Flink 1.3.1 in production, we have one job manager and 3 task >>> managers in standalone mode. Recently, we've noticed that we have memory >>> related problems. We use docker container to serve Flink cluster. We have >>> 300 slots and 20 jobs are running with parallelism of 10. Also the job >>> count >>> may be change over time. Taskmanager memory usage always increases. After >>> job cancelation this memory usage doesn't decrease. We've tried to >>> investigate the problem and we've got the task manager jvm heap snapshot. >>> According to the jam heap analysis, possible memory leak was Flink list >>> state descriptor. But we are not sure that is the cause of our memory >>> problem. How can we solve the problem? >>> >>> >>> >> >>