Hi, everyone:

I’m a flink sql user, and the version is 1.8.2.
    Recently I confuse about memory and backpressure. I have two job on yarn, 
due to memory over, it’s frequently killed by yarn.
One job,I have 3 taskmanagers and 6 parallelism, each one has 8G memory.It read 
from kafka, one minute tumble windows to calculate pv and uv. There are many 
aggregation dimensions, to avoid data skew, it group by 
deviceId,TUMBLE(event_time, INTERVAL '1' MINUTE)。My question is that the 
checkpoint is just 60MB, I give 24G memory, why it was killed by yarn? I use 
rocksdb as backend, and data is big, but I think it should trigger backpressure 
rather than OOM, although it dosen’t trigger. In Pool Usage is 0.45 normally.
Another job looks different, I use 2 taskmanagers and 4 parallelism, each one 
has 20G memory. I define a aggregate functions to calculate complex data, group 
by date,hour,deviceId. it seems like first job, OOM and no backpressure. but 
the problem is when I read one day data, just one taskmanager was killed by 
yarn, I confuse about this. according to dashboard, I don't find data skew, but 
why just one taskmanager?
May be it’s the same question or not, but I want to know more about memory used 
in flink, and backpressure can stop source or not, and how to trigger it, 
rocksdb affect on flink.
Thanks for reading, it would be better if there were some suggestions.Thank you.

Reply via email to