Re: OperatorState partioning when recovering from failure

2017-05-04 Thread Stefan Richter
Hi, the repartitoning happens indeed as some round-robin algorithm (see RoundRobinOperatorStateRepartitioner). This repartitioning happens at the level of the checkpoint coordinator in the master on restore, by redistrubution of state handles. The state that those handles are pointing to is a b

Re: OperatorState partioning when recovering from failure

2017-05-04 Thread Kostas Kloudas
Hi Seth, Upon restoring, splits will be re-shuffled among the new tasks, and I believe that state is repartitioned in a round robin way (although I am not 100% sure so I am also including Stefan and Aljoscha in this). The priority queues will be reconstructed based on the restored elements. So

OperatorState partioning when recovering from failure

2017-05-04 Thread Seth Wiesman
I am curious about how operator state is repartitioned to subtasks when a job is resumed from a checkpoint or savepoint. The reason is that I am having issues with the ContinuousFileReaderOperator when recovering from a failure. I consume most of my data from files off S3. I have a custom file m