[ 
https://issues.apache.org/jira/browse/FLINK-17295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger reopened FLINK-17295:
------------------------------------
      Assignee:     (was: Yangze Guo)

I opened a PR to revert this change: https://github.com/apache/flink/pull/13892

As you can read in FLINK-19805, we found that the ExecutionAttemptID proposed 
in this change so far does not work well with leader changes: It can happen 
that ExecutionAttemptID from leader 1 will report status updates (failures) to 
leader 2, which will then fail with an unexpected message. 

There are different ideas how to solve this problem:
a) Add (part of the) leader id to the ExecutionAttemptID (complicated to wire 
into the ExecutionAttemptID generation)
b) Include a random element in the ExecutionAttemptID
c) Introduce a random element in the ExecutionGraph (executionAttemptId) and 
forward that to each ExecutionAttemptID.



> Refactor the ExecutionAttemptID to consist of ExecutionVertexID and 
> attemptNumber
> ---------------------------------------------------------------------------------
>
>                 Key: FLINK-17295
>                 URL: https://issues.apache.org/jira/browse/FLINK-17295
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Runtime / Coordination
>            Reporter: Yangze Guo
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.12.0
>
>
> Make the ExecutionAttemptID being composed of (ExecutionVertexID, 
> attemptNumber).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to