Thanks for the heads-up and explaining how you resolve the issue! Best, Fabian
2017-10-18 3:50 GMT+02:00 ShB <shon.balakris...@gmail.com>: > I just wanted to leave an update about this issue, for someone else who > might > come across it. The problem was with memory, but it was disk memory and not > heap/off-heap memory. Yarn was killing off my containers as they exceeded > the threshold for disk utilization and this was manifesting as Task manager > was lost/killed or JobClientActorConnectionTimeoutException: Lost > connection > to the JobManager. Digging deep into the individual instance node manager > logs provided some hints about it being a disk issue. > > Some fixes for this problem: > yarn.nodemanager.disk-health-checker.max-disk-utilization- > per-disk-percentage > -- can be increased to alleviate the problem temporarily. > Increasing the disk capacity on each task manager is a more long-term fix. > Increasing the number of task managers increases available disk memory and > hence is also a fix. > > Thanks! > > > > -- > Sent from: http://apache-flink-user-mailing-list-archive.2336050. > n4.nabble.com/ >