[jira] [Commented] (FLINK-20605) DeclarativeSlotManager crashes if slot allocation notification is processed after taskexecutor shutdown

Till Rohrmann (Jira) Tue, 15 Dec 2020 02:59:39 -0800


    [ 
https://issues.apache.org/jira/browse/FLINK-20605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17249624#comment-17249624
 ]


Till Rohrmann commented on FLINK-20605:
---------------------------------------

I guess for the first problem we need to check whether it is still valid before 
processing the {{handleAsync}} callback.

For the second problem we might miss a check whether we are shut down or not.

For the third problem we either tolerate duplicate status updates or need to 
enforce on the sender side that it is only sent once. In general, the former 
approach should be more robust.

> DeclarativeSlotManager crashes if slot allocation notification is processed 
> after taskexecutor shutdown
> -------------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-20605
>                 URL: https://issues.apache.org/jira/browse/FLINK-20605
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / Coordination
>    Affects Versions: 1.13.0
>            Reporter: Chesnay Schepler
>            Assignee: Chesnay Schepler
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.13.0
>
>
> It is possible that a notification from a task executor about a slot being 
> allocated can be processed after that very task executor has unregistered 
> itself from the resource manager.
> As a result we run into an exception when trying to mark this slot as 
> allocated, because it no longer exists and a precondition catches this case.
> We could solve this by checking in 
> {{DeclarativeResourceManager#allocateSlot}} whether the task executor we 
> received the acknowledge from is still registered.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-20605) DeclarativeSlotManager crashes if slot allocation notification is processed after taskexecutor shutdown

Reply via email to