Re: Flink Exception - assigned slot container was removed

Flink Developer Mon, 26 Nov 2018 00:32:15 -0800

Thanks for the suggestion Qi. I tried increasing slot.idle.timeout to 3600000 
but it seems to still have encountered the issue. Does this mean if a slot or 
"flink worker" has not processed items for 1 hour, that it will be removed?


Would any other flink configuration properties help for this?

slot.request.timeout
web.timeout
heartbeat.interval
heartbeat.timeout

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Sunday, November 25, 2018 6:56 PM, 罗齐 <[email protected]> wrote:

> Hi,
>
> It looks that some of your slots were freed during the job execution 
> (possibly due to idle for too long). AFAIK the exception was thrown when a 
> pending Slot request was removed. You can try increase the 
> “Slot.idle.timeout” to mitigate this issue (default is 50000, try 3600000 or 
> higher).
>
> Regards,
> Qi
>
>> On Nov 26, 2018, at 7:36 AM, Flink Developer <[email protected]> 
>> wrote:
>>
>> Hi, I have a Flink application sourcing from a topic in Kafka (400 
>> partitions) and sinking to S3 using bucketingsink and using RocksDb for 
>> checkpointing every 2 mins. The Flink app runs with parallelism 400 so that 
>> each worker handles a partition. This is using Flink 1.5.2. The Flink 
>> cluster uses 10 task managers with 40 slots each.
>>
>> After running for a few days straight, it encounters a Flink exception:
>> Org.apache.flink.util.FlinkException: The assigned slot 
>> container_1234567_0003_01_000009_1 was removed.
>>
>> This causes the Flink job to fail. It is odd to me. I am unsure what causes 
>> this. Also, during this time, I see some checkpoints stating "checkpoint was 
>> declined (tasks not ready)". At this point, the job is unable to recover and 
>> fails. Does this happen if a slot or worker is not doing processing for X 
>> amount of time? Would I need to increase the Flink config properties for the 
>> following when creating the Flink cluster in yarn?
>>
>> Slot.idle.timeout
>> Slot.request.timeout
>> Web.timeout
>> Heartbeat.interval
>> Heartbeat.timeout
>>
>> Any help would be greatly appreciated.

Re: Flink Exception - assigned slot container was removed

Reply via email to