Chris,

What version of Flink are you using? I also have an issue with slow but
continual memory growth in a windowing function but it seems like the
taskmanager.sh script I'm using already has the -XX+UseG1GC flag set:
https://github.com/apache/flink/blob/master/flink-dist/src/main/flink-bin/bin/taskmanager.sh#L43

On Mon, May 25, 2020 at 3:31 AM Aljoscha Krettek <aljos...@apache.org>
wrote:

> Just to double check: the issue was resolved by using a different GC?
> Because the default GC was too "lazy". ;-)
>
> Best,
> Aljoscha
>
> On 21.05.20 18:09, Slotterback, Chris wrote:
> > For those who are interested or googling the mail archives in 8 months,
> the issue was garbage collection related.
> >
> > The default 1.8 jvm garbage collector (parallel gc) was being lazy in
> its marking and collection phases and letting the heap build to a level
> that was causing memory exceptions and stalled tms. This app has a lot of
> state, and memory usage well above 10GB at times. The solution was moving
> to the G1 collector which is very aggressive in its young generation
> collection by default, at the cost of some cpu usage and requires some
> tuning, but keeps the memory levels much more stable.
> >
> > On 5/20/20, 9:05 AM, "Slotterback, Chris" <
> chris_slotterb...@comcast.com> wrote:
> >
> >      What I've noticed is that heap memory ends up growing linearly with
> time indefinitely (past 24 hours) until it hits the roof of the allocated
> heap for the task manager, which leads me to believe I am leaking
> somewhere. All of my windows have an allowed lateness of 5 minutes, and my
> watermarks are pulled from time embedded in the records using
> BoundedOutOfOrdernessTimestampExtractors. My TumblingEventTimeWindows and
> SlidingEventTimeWindow all use AggregateFunctions, and my intervalJoins use
> ProcessJoinFunctions.
> >
> >      I expect this app to use a significant amount of memory at scale
> due to the 288 5-minute intervals in 24 hours, and records being put in all
> 288 window states, and as the application runs for 24 hours memory would
> increase as all 288(*unique key) windows build with incoming records, but
> then after 24 hours the memory should stop growing, or at least grow at a
> different rate?
> >
> >      Also of note, we are using a FsStateBackend configuration, and plan
> to move to RocksDBStateBackend, but from what I can tell, this would only
> reduce memory and delay hitting the heap memory capacity, not stall it
> forever?
> >
> >      Thanks
> >      Chris
> >
> >
> >      On 5/18/20, 7:29 AM, "Aljoscha Krettek" <aljos...@apache.org>
> wrote:
> >
> >          On 15.05.20 15:17, Slotterback, Chris wrote:
> >          > My understanding is that while all these windows build their
> memory state, I can expect heap memory to grow for the 24 hour length of
> the SlidingEventTimeWindow, and then start to flatten as the t-24hr window
> frames expire and release back to the JVM. What is actually happening is
> when a constant data source feeds the stream, the heap memory profile grows
> linearly past the 24 hour mark. Could this be a result of a
> misunderstanding of how the window’s memory states are kept, or is my
> assumption correct, and it is more likely I have a leak somewhere?
> >
> >          Will memory keep growing indefinitely? That would indicate a
> bug? What
> >          sort of lateness/watermark settings do you have? What window
> function do
> >          you use? ProcessWindowFunction, or sth that aggregates?
> >
> >          Side note: with sliding windows of 24h/5min you will have a
> "write
> >          amplification" of 24*60/5=288, each record will be in 288
> windows, which
> >          will each be kept in separate state?
> >
> >          Best,
> >          Aljoscha
> >
> >
> >
>
>

Reply via email to