Hi
For sure I can share more info. We run on Flink 1.4.2 ( but have the same
problems on 1.3.2 ) on a Aws EMR cluster. 6 taskmanagers on each m4.xlarge
slave. Taskmanager heab set to 1850. We use RockStateDbBackend. we have set
akka.ask.timeout to 60 s if GC should prevent heatbeat,
yarn.maximum-f
Hi Lasse,
in order to avoid OOM exception you should analyze your Flink job
implementation. Are you creating a lot of objects within your Flink
functions? Which state backend are you using? Maybe you can tell us a
little bit more about your pipeline?
Usually, there should be enough memory fo
Hi.
When our jobs are catching up they read with a factor 10-20 times normal rate
but then we loose our task managers with OOM. We could increase the memory
allocation but is there a way to figure out how high rate we can consume with
the current memory and slot allocation and a way to limit t