Re: AKA and quarantine

Vishal Santoshi Mon, 29 Jan 2018 07:08:31 -0800

Thank you.

On Mon, Jan 29, 2018 at 3:17 AM, Fabian Hueske <fhue...@gmail.com> wrote:


> Hi Vishal,
>
> sorry for the late response.
> Till (in CC) might be able to answer your Akka / coordination related
> questions.
>
> Best, Fabian
>
> 2018-01-24 1:22 GMT+01:00 Vishal Santoshi <vishal.santo...@gmail.com>:
>
>> Any suggestions ?  I know these are very general issue but these are edge
>> conditions that we want the community to give us general advise on ..
>>
>> On Sun, Jan 21, 2018 at 3:16 PM, Vishal Santoshi <
>> vishal.santo...@gmail.com> wrote:
>>
>>> There have been a couple of instances where one of our TMs was
>>> quarantined ( the cause is irrelevant to this discussion ).  And we had to
>>> bounce the TM to bring back sanity to the cluster.  There have been
>>> discussions around and am trying to distill them. My questions are
>>>
>>>
>>> *  Based on https://issues.apache.org/jira/browse/FLINK-3347 is it
>>> advisable to set the taskmanager.exit-on-fatal-akka-error  to true. ?
>>>
>>> * Is the akka.ask.timeout relevant here ? We could increase the value to
>>> greater than 10s but based on your experiences is it more of a  "mask the
>>> issue" exercise or is 10s generally a low value that *should* be
>>> increased ?
>>>
>>> * Is it possible or is there some effort being put into per job
>>> memory/resource consumption for a multi job setup that is very normal with
>>> flink ?
>>>
>>> * Is there an effort to monitor ROCKSDB useage ( off heap and what not )
>>> ? It seems a black box to a user as of today.
>>>
>>> Thank you and regards.
>>>
>>>
>>>
>>>
>>>
>>
>

Re: AKA and quarantine

Reply via email to