I was looking into the handling of state in streaming operators, and it is a bit hidden from the system
Right now, functions can (of they want) put some state into their context. At runtime, state may occur or not. Before runtime, the system cannot tell which operators are going to be stateful, and which are going to be stateless. I think it is a good idea to expose that. We can use that for optimizations and we know which operators need to checkpoint state and acknowledge the asynchronous checkpoint. At this point, we need to assume that all operators need to send a confirmation message, which is unnecessary. Also, I think we should expose which operations want a "commit" notification after the checkpoint completed. Good examples are - the KafkaConsumer source, which can then commit the offset that is safe to zookeeper - a transactional KafkaProduce sink, which can commit a batch of messages to the kafka partition once the checkpoint is done (to get exactly once guarantees that include the sink) Comments welcome! Greetings, Stephan