how to start tuning to prevent OutOfMemory

2016-07-27 Thread Istvan Soos
Hi, We can see an occasional OOM issue with our Flink jobs. Maybe the input got more diverse, and the grouping has much more keys, not really sure about that part. How do you usually tackle these issues? We are running with parallelism between 5-30. Would it help if we turn that down? We do set

fault tolerance: suspend and resume?

2016-07-27 Thread Istvan Soos
Hi, I was wondering how Flink's fault tolerance works, because this page is short on the details: https://ci.apache.org/projects/flink/flink-docs-master/apis/batch/fault_tolerance.html My environment has a backend service that may be out for a couple of hours (sad, but working on fixing that). I