Making state in streaming more explicit

Stephan Ewen Thu, 30 Apr 2015 11:48:04 -0700

I was looking into the handling of state in streaming operators, and it is
a bit hidden from the system


Right now, functions can (of they want) put some state into their context.
At runtime, state may occur or not. Before runtime, the system cannot tell
which operators are going to be stateful, and which are going to be
stateless.

I think it is a good idea to expose that. We can use that for optimizations
and we know which operators need to checkpoint state and acknowledge the
asynchronous checkpoint.

At this point, we need to assume that all operators need to send a
confirmation message, which is unnecessary.

Also, I think we should expose which operations want a "commit"
notification after the checkpoint completed. Good examples are

  - the KafkaConsumer source, which can then commit the offset that is safe
to zookeeper

  - a transactional KafkaProduce sink, which can commit a batch of messages
to the kafka partition once the checkpoint is done (to get exactly once
guarantees that include the sink)

Comments welcome!

Greetings,
Stephan

Making state in streaming more explicit

Reply via email to