Hi all!

We just published a blog post about how streaming fault tolerance
mechanisms evolved, and what kind of performance Flink gets with its
checkpointing mechanism.

I think it is a pretty interesting read for people that are interested in
Flink or data streaming in general.

The blog post talks about:

  - Fault tolerance techniques, starting from acknowledgements, over micro
batches, to transactional updates and distributed snapshots.

  - Performance of Flink, throughput, latency, and tradeoffs.

  - A "chaos monkey" experiment where computation continues strongly
consistent even when periodically killing workers.


Comments welcome!

Greetings,
Stephan

Reply via email to