Re: AKA and quarantine

Vishal Santoshi Tue, 23 Jan 2018 16:22:55 -0800

Any suggestions ?  I know these are very general issue but these are edge
conditions that we want the community to give us general advise on ..


On Sun, Jan 21, 2018 at 3:16 PM, Vishal Santoshi <vishal.santo...@gmail.com>
wrote:

> There have been a couple of instances where one of our TMs was quarantined
> ( the cause is irrelevant to this discussion ).  And we had to bounce the
> TM to bring back sanity to the cluster.  There have been discussions around
> and am trying to distill them. My questions are
>
>
> *  Based on https://issues.apache.org/jira/browse/FLINK-3347 is it
> advisable to set the taskmanager.exit-on-fatal-akka-error  to true. ?
>
> * Is the akka.ask.timeout relevant here ? We could increase the value to
> greater than 10s but based on your experiences is it more of a  "mask the
> issue" exercise or is 10s generally a low value that *should* be
> increased ?
>
> * Is it possible or is there some effort being put into per job
> memory/resource consumption for a multi job setup that is very normal with
> flink ?
>
> * Is there an effort to monitor ROCKSDB useage ( off heap and what not ) ?
> It seems a black box to a user as of today.
>
> Thank you and regards.
>
>
>
>
>

Re: AKA and quarantine

Reply via email to