Just to double check: the issue was resolved by using a different GC?
Because the default GC was too "lazy". ;-)
Best,
Aljoscha
On 21.05.20 18:09, Slotterback, Chris wrote:
For those who are interested or googling the mail archives in 8 months, the
issue was garbage collection related.
The default 1.8 jvm garbage collector (parallel gc) was being lazy in its
marking and collection phases and letting the heap build to a level that was
causing memory exceptions and stalled tms. This app has a lot of state, and
memory usage well above 10GB at times. The solution was moving to the G1
collector which is very aggressive in its young generation collection by
default, at the cost of some cpu usage and requires some tuning, but keeps the
memory levels much more stable.
On 5/20/20, 9:05 AM, "Slotterback, Chris" <chris_slotterb...@comcast.com>
wrote:
What I've noticed is that heap memory ends up growing linearly with time
indefinitely (past 24 hours) until it hits the roof of the allocated heap for
the task manager, which leads me to believe I am leaking somewhere. All of my
windows have an allowed lateness of 5 minutes, and my watermarks are pulled
from time embedded in the records using
BoundedOutOfOrdernessTimestampExtractors. My TumblingEventTimeWindows and
SlidingEventTimeWindow all use AggregateFunctions, and my intervalJoins use
ProcessJoinFunctions.
I expect this app to use a significant amount of memory at scale due to
the 288 5-minute intervals in 24 hours, and records being put in all 288 window
states, and as the application runs for 24 hours memory would increase as all
288(*unique key) windows build with incoming records, but then after 24 hours
the memory should stop growing, or at least grow at a different rate?
Also of note, we are using a FsStateBackend configuration, and plan to
move to RocksDBStateBackend, but from what I can tell, this would only reduce
memory and delay hitting the heap memory capacity, not stall it forever?
Thanks
Chris
On 5/18/20, 7:29 AM, "Aljoscha Krettek" <aljos...@apache.org> wrote:
On 15.05.20 15:17, Slotterback, Chris wrote:
> My understanding is that while all these windows build their memory
state, I can expect heap memory to grow for the 24 hour length of the
SlidingEventTimeWindow, and then start to flatten as the t-24hr window frames
expire and release back to the JVM. What is actually happening is when a constant
data source feeds the stream, the heap memory profile grows linearly past the 24
hour mark. Could this be a result of a misunderstanding of how the window’s memory
states are kept, or is my assumption correct, and it is more likely I have a leak
somewhere?
Will memory keep growing indefinitely? That would indicate a bug? What
sort of lateness/watermark settings do you have? What window function
do
you use? ProcessWindowFunction, or sth that aggregates?
Side note: with sliding windows of 24h/5min you will have a "write
amplification" of 24*60/5=288, each record will be in 288 windows,
which
will each be kept in separate state?
Best,
Aljoscha