Re: AKA and quarantine

Fabian Hueske Mon, 29 Jan 2018 00:20:32 -0800

Hi Vishal,

sorry for the late response.
Till (in CC) might be able to answer your Akka / coordination related
questions.


Best, Fabian

2018-01-24 1:22 GMT+01:00 Vishal Santoshi <vishal.santo...@gmail.com>:

> Any suggestions ?  I know these are very general issue but these are edge
> conditions that we want the community to give us general advise on ..
>
> On Sun, Jan 21, 2018 at 3:16 PM, Vishal Santoshi <
> vishal.santo...@gmail.com> wrote:
>
>> There have been a couple of instances where one of our TMs was
>> quarantined ( the cause is irrelevant to this discussion ).  And we had to
>> bounce the TM to bring back sanity to the cluster.  There have been
>> discussions around and am trying to distill them. My questions are
>>
>>
>> *  Based on https://issues.apache.org/jira/browse/FLINK-3347 is it
>> advisable to set the taskmanager.exit-on-fatal-akka-error  to true. ?
>>
>> * Is the akka.ask.timeout relevant here ? We could increase the value to
>> greater than 10s but based on your experiences is it more of a  "mask the
>> issue" exercise or is 10s generally a low value that *should* be
>> increased ?
>>
>> * Is it possible or is there some effort being put into per job
>> memory/resource consumption for a multi job setup that is very normal with
>> flink ?
>>
>> * Is there an effort to monitor ROCKSDB useage ( off heap and what not )
>> ? It seems a black box to a user as of today.
>>
>> Thank you and regards.
>>
>>
>>
>>
>>
>

Re: AKA and quarantine

Reply via email to