Hi - I have some questions regarding Flink's checkpointing, specifically
related to storing state in the backends.

So let's say an operator in a streaming job is building up some state. When
it receives barriers from all of its input streams, does it store *all* of
its state to the backend? I think that is what the docs [1] and paper [2]
imply, but want to make sure. In other words, if the operator contains
100MB of state, and the backend is HDFS, does the operator copy all 100MB
of state to HDFS during the checkpoint?

Following on this example, say the operator is a global window and is
storing some state for each unique key observed in the stream of messages
(e.g. userId). Assume that over time, the number of observed unique keys
grows, so the size of the state also grows (the window state is never
purged). Is the entire operator state at the time of each checkpoint stored
to the backend? So that over time, the size of the state stored for each
checkpoint to the backend grows? Or is the state stored to the backend
somehow just the state that changed in some way since the last checkpoint?

Are old checkpoint states in the backend ever deleted / cleaned up? That
is, if all of the state for checkpoint n in the backend is all that is
needed to restore a failed job, then all state for all checkpoints m < n
should not be needed any more, right? Can all of those old checkpoints be
deleted from the backend? Does Flink do this?

Thanks,
Zach

[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.0/internals/stream_checkpointing.html
[2] http://arxiv.org/abs/1506.08603

Reply via email to