We also have this configuration set in case it makes any difference when allocation tasks: cluster.evenly-spread-out-slots.
On 2021/09/08 18:09:52, Xiang Zhang <xi...@stripe.com> wrote: > Hello, > > We have an app running on Flink 1.10.2 deployed in standalone mode. We > enabled task-local recovery by setting both *state.backend.local-recovery *and > *state.backend.rocksdb.localdir*. The app has over 100 task managers and 2 > job managers (active and passive). > > This is what we have observed. When we restarted a task manager, all tasks > got canceled (due to the default failover configuration > <https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/dev/execution/task_failure_recovery/#failover-strategies>). > Then these tasks were re-distributed among the task manager (e.g. some > tasks manager have more slots used than before restart). This caused all > task managers to download state from remote storage all over again. > > The same thing happened when we restarted a job manager. The job manager > failed over to the passive one successfully, however all tasks were > canceled and reallocated among the task managers again. > > My understanding is that if task-local recovery is enabled, Flink will try > to enable sticky assignment of tasks to previous task managers they run on. > This doesn't seem to be the case. My question is how we can enable > this allocation-preserving > scheduling > <https://ci.apache.org/projects/flink/flink-docs-master/docs/ops/state/large_state_tuning/#allocation-preserving-scheduling> > when Flink handles failures. > > Thanks, > Xiang >