Re: Allocation-preserving scheduling and task-local recovery

2021-09-10 Thread Xiang Zhang
Robert, thank you for your reply! I tried to remove "cluster.evenly-spread-out-slots", and then tested two scenarios: 1) restart the leader job manager; 2) restart a single task manager. These tests are done in a testing environment where I have six task managers and only four tasks to schedul

Re: Allocation-preserving scheduling and task-local recovery

2021-09-09 Thread Robert Metzger
Hi, from my understanding of the code [1], the task scheduling first considers the state location, and then uses the evenly spread out scheduling strategy as a fall back. So in my understanding of the code, the local recovery should have preference over the evenly spread out strategy. If you can e

Re: Allocation-preserving scheduling and task-local recovery

2021-09-08 Thread Xiang Zhang
We also have this configuration set in case it makes any difference when allocation tasks: cluster.evenly-spread-out-slots. On 2021/09/08 18:09:52, Xiang Zhang wrote: > Hello, > > We have an app running on Flink 1.10.2 deployed in standalone mode. We > enabled task-local recovery by setting bo

Allocation-preserving scheduling and task-local recovery

2021-09-08 Thread Xiang Zhang
Hello, We have an app running on Flink 1.10.2 deployed in standalone mode. We enabled task-local recovery by setting both *state.backend.local-recovery *and *state.backend.rocksdb.localdir*. The app has over 100 task managers and 2 job managers (active and passive). This is what we have observed.