Any suggestions ? I know these are very general issue but these are edge conditions that we want the community to give us general advise on ..
On Sun, Jan 21, 2018 at 3:16 PM, Vishal Santoshi <vishal.santo...@gmail.com> wrote: > There have been a couple of instances where one of our TMs was quarantined > ( the cause is irrelevant to this discussion ). And we had to bounce the > TM to bring back sanity to the cluster. There have been discussions around > and am trying to distill them. My questions are > > > * Based on https://issues.apache.org/jira/browse/FLINK-3347 is it > advisable to set the taskmanager.exit-on-fatal-akka-error to true. ? > > * Is the akka.ask.timeout relevant here ? We could increase the value to > greater than 10s but based on your experiences is it more of a "mask the > issue" exercise or is 10s generally a low value that *should* be > increased ? > > * Is it possible or is there some effort being put into per job > memory/resource consumption for a multi job setup that is very normal with > flink ? > > * Is there an effort to monitor ROCKSDB useage ( off heap and what not ) ? > It seems a black box to a user as of today. > > Thank you and regards. > > > > >