Hi,
the repartitoning happens indeed as some round-robin algorithm (see
RoundRobinOperatorStateRepartitioner). This repartitioning happens at the level
of the checkpoint coordinator in the master on restore, by redistrubution of
state handles. The state that those handles are pointing to is a b
Hi Seth,
Upon restoring, splits will be re-shuffled among the new tasks, and I believe
that state is repartitioned
in a round robin way (although I am not 100% sure so I am also including Stefan
and Aljoscha in this).
The priority queues will be reconstructed based on the restored elements. So
I am curious about how operator state is repartitioned to subtasks when a job
is resumed from a checkpoint or savepoint. The reason is that I am having
issues with the ContinuousFileReaderOperator when recovering from a failure.
I consume most of my data from files off S3. I have a custom file m