Hello,

I read in the docs that Kafka Streams stores the computed aggregations in a
local embedded key-value store (RocksDB by default), i.e., Kafka Streams
provides so-called state stores. I'm wondering about the relationship
between each state store and its replicated changelog Kafka topic.

If we use the WorkCountDemo example I would like to know what are the
sequence of events to understand both concepts when we execute something
like that:

I send to topic the words: "hello", "world", "hello"

Changelog topic messages:

"hello" => 1
"world" => 1
"hello" => 2

It's ok.

Q1) What is the status of RocksDB at this moment?

Q2) If I delete all data in the changelog Kafka topic and send a new
"hello", I can see in the changelog topic:

"hello" => 3

What is happening? CountByKey are counting using the RocksDB data stored
prior to updating the changelog again?

Q3) Could someone explain me in depth the following process step by step?
What happen if the changelog Kafka topic was deleted as I did in the
question above?

"If tasks run on a machine that fails and are restarted on another machine,
Kafka Streams guarantees to restore their associated state stores to the
content before the failure by replaying the corresponding changelog topics
prior to resuming the processing on the newly started tasks."

Q4) What are the differences with operations without changelog Kafka topic
associated like joins between two KTables when machine fails occurs and we
need fault tolerance policy?

Thanks!

Reply via email to