Gary Yao created FLINK-14826:
--------------------------------

             Summary: Enable 'Streaming bucketing end-to-end test' to pass with 
new DefaultScheduler
                 Key: FLINK-14826
                 URL: https://issues.apache.org/jira/browse/FLINK-14826
             Project: Flink
          Issue Type: Sub-task
          Components: Tests
    Affects Versions: 1.10.0
            Reporter: Gary Yao
            Assignee: Gary Yao
             Fix For: 1.10.0


The tests fails because we exhaust the number of restarts (3). The reason is 
that the new scheduler may re-schedule tasks faster – we start counting down 
the restart back-off time as soon as we triggered task cancellation, however 
the legacy scheduler will only start counting down after the task cancellation 
is finished. Thus, re-scheduled tasks may be deployed into a TM that was 
killed, and therefore increase the number of restarts multiple times. The speed 
of the TM loss detection depends on heartbeat.interval and heartbeat.timeout. 
These settings are by default 10s and 50s respectively. The problem can even be 
reproduced with the legacy scheduler on the current master by setting 
heartbeat.timeout to a high value, such as 180000.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to