That's really helpful, thanks Till!
On Thu, Apr 8, 2021 at 6:32 AM Till Rohrmann wrote:
> Hi Kevin,
>
> when decreasing the TaskManager count I assume that you also decrease the
> parallelism of the Flink job. There are three aspects which can then cause
> a slower recovery.
>
> 1) Each Task get
Hi Kevin,
when decreasing the TaskManager count I assume that you also decrease the
parallelism of the Flink job. There are three aspects which can then cause
a slower recovery.
1) Each Task gets a larger key range assigned. Therefore, each TaskManager
has to download more data in order to restar
Hi all,
We are trying to benchmark savepoint size vs. restore time.
One thing we've observed is that when we reduce the number of task
managers, the time to restore from a savepoint increases drastically:
1/ Restoring from 9.7tb savepoint onto 156 task managers takes 28 minutes
2/ Restoring from