Chris, What version of Flink are you using? I also have an issue with slow but continual memory growth in a windowing function but it seems like the taskmanager.sh script I'm using already has the -XX+UseG1GC flag set: https://github.com/apache/flink/blob/master/flink-dist/src/main/flink-bin/bin/taskmanager.sh#L43
On Mon, May 25, 2020 at 3:31 AM Aljoscha Krettek <aljos...@apache.org> wrote: > Just to double check: the issue was resolved by using a different GC? > Because the default GC was too "lazy". ;-) > > Best, > Aljoscha > > On 21.05.20 18:09, Slotterback, Chris wrote: > > For those who are interested or googling the mail archives in 8 months, > the issue was garbage collection related. > > > > The default 1.8 jvm garbage collector (parallel gc) was being lazy in > its marking and collection phases and letting the heap build to a level > that was causing memory exceptions and stalled tms. This app has a lot of > state, and memory usage well above 10GB at times. The solution was moving > to the G1 collector which is very aggressive in its young generation > collection by default, at the cost of some cpu usage and requires some > tuning, but keeps the memory levels much more stable. > > > > On 5/20/20, 9:05 AM, "Slotterback, Chris" < > chris_slotterb...@comcast.com> wrote: > > > > What I've noticed is that heap memory ends up growing linearly with > time indefinitely (past 24 hours) until it hits the roof of the allocated > heap for the task manager, which leads me to believe I am leaking > somewhere. All of my windows have an allowed lateness of 5 minutes, and my > watermarks are pulled from time embedded in the records using > BoundedOutOfOrdernessTimestampExtractors. My TumblingEventTimeWindows and > SlidingEventTimeWindow all use AggregateFunctions, and my intervalJoins use > ProcessJoinFunctions. > > > > I expect this app to use a significant amount of memory at scale > due to the 288 5-minute intervals in 24 hours, and records being put in all > 288 window states, and as the application runs for 24 hours memory would > increase as all 288(*unique key) windows build with incoming records, but > then after 24 hours the memory should stop growing, or at least grow at a > different rate? > > > > Also of note, we are using a FsStateBackend configuration, and plan > to move to RocksDBStateBackend, but from what I can tell, this would only > reduce memory and delay hitting the heap memory capacity, not stall it > forever? > > > > Thanks > > Chris > > > > > > On 5/18/20, 7:29 AM, "Aljoscha Krettek" <aljos...@apache.org> > wrote: > > > > On 15.05.20 15:17, Slotterback, Chris wrote: > > > My understanding is that while all these windows build their > memory state, I can expect heap memory to grow for the 24 hour length of > the SlidingEventTimeWindow, and then start to flatten as the t-24hr window > frames expire and release back to the JVM. What is actually happening is > when a constant data source feeds the stream, the heap memory profile grows > linearly past the 24 hour mark. Could this be a result of a > misunderstanding of how the window’s memory states are kept, or is my > assumption correct, and it is more likely I have a leak somewhere? > > > > Will memory keep growing indefinitely? That would indicate a > bug? What > > sort of lateness/watermark settings do you have? What window > function do > > you use? ProcessWindowFunction, or sth that aggregates? > > > > Side note: with sliding windows of 24h/5min you will have a > "write > > amplification" of 24*60/5=288, each record will be in 288 > windows, which > > will each be kept in separate state? > > > > Best, > > Aljoscha > > > > > > > >