Thank you. On Mon, Jan 29, 2018 at 3:17 AM, Fabian Hueske <fhue...@gmail.com> wrote:
> Hi Vishal, > > sorry for the late response. > Till (in CC) might be able to answer your Akka / coordination related > questions. > > Best, Fabian > > 2018-01-24 1:22 GMT+01:00 Vishal Santoshi <vishal.santo...@gmail.com>: > >> Any suggestions ? I know these are very general issue but these are edge >> conditions that we want the community to give us general advise on .. >> >> On Sun, Jan 21, 2018 at 3:16 PM, Vishal Santoshi < >> vishal.santo...@gmail.com> wrote: >> >>> There have been a couple of instances where one of our TMs was >>> quarantined ( the cause is irrelevant to this discussion ). And we had to >>> bounce the TM to bring back sanity to the cluster. There have been >>> discussions around and am trying to distill them. My questions are >>> >>> >>> * Based on https://issues.apache.org/jira/browse/FLINK-3347 is it >>> advisable to set the taskmanager.exit-on-fatal-akka-error to true. ? >>> >>> * Is the akka.ask.timeout relevant here ? We could increase the value to >>> greater than 10s but based on your experiences is it more of a "mask the >>> issue" exercise or is 10s generally a low value that *should* be >>> increased ? >>> >>> * Is it possible or is there some effort being put into per job >>> memory/resource consumption for a multi job setup that is very normal with >>> flink ? >>> >>> * Is there an effort to monitor ROCKSDB useage ( off heap and what not ) >>> ? It seems a black box to a user as of today. >>> >>> Thank you and regards. >>> >>> >>> >>> >>> >> >