Hi,
from my understanding of the code [1], the task scheduling first considers
the state location, and then uses the evenly spread out scheduling strategy
as a fall back. So in my understanding of the code, the local recovery
should have preference over the evenly spread out strategy.

If you can easily test it, I would still recommend removing the
"cluster.evenly-spread-out-slots" strategy, just to make sure my
understanding is really correct.

I don't think that's the case, but just to make sure: You are only
restarting a single task manager, right? The other task managers keep
running? (Afaik the state information is lost of a TaskManager restarts)

Sorry that I don't have a real answer here (yet). Is there anything
suspicious in the JobManager or TaskManager logs?


[1]
https://github.com/apache/flink/blob/release-1.10/flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/slotpool/PreviousAllocationSlotSelectionStrategy.java

On Wed, Sep 8, 2021 at 9:44 PM Xiang Zhang <xi...@stripe.com> wrote:

> We also have this configuration set in case it makes any difference when
> allocation tasks: cluster.evenly-spread-out-slots.
>
> On 2021/09/08 18:09:52, Xiang Zhang <xi...@stripe.com> wrote:
> > Hello,
> >
> > We have an app running on Flink 1.10.2 deployed in standalone mode. We
> > enabled task-local recovery by setting both
> *state.backend.local-recovery *and
> > *state.backend.rocksdb.localdir*. The app has over 100 task managers and
> 2
> > job managers (active and passive).
> >
> > This is what we have observed. When we restarted a task manager, all
> tasks
> > got canceled (due to the default failover configuration
> > <
> https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/dev/execution/task_failure_recovery/#failover-strategies
> >).
> > Then these tasks were re-distributed among the task manager (e.g. some
> > tasks manager have more slots used than before restart). This caused all
> > task managers to download state from remote storage all over again.
> >
> > The same thing happened when we restarted a job manager. The job manager
> > failed over to the passive one successfully, however all tasks were
> > canceled and reallocated among the task managers again.
> >
> > My understanding is that if task-local recovery is enabled, Flink will
> try
> > to enable sticky assignment of tasks to previous task managers they run
> on.
> > This doesn't seem to be the case. My question is how we can enable
> > this allocation-preserving
> > scheduling
> > <
> https://ci.apache.org/projects/flink/flink-docs-master/docs/ops/state/large_state_tuning/#allocation-preserving-scheduling
> >
> > when Flink handles failures.
> >
> > Thanks,
> > Xiang
> >
>

Reply via email to