Hello,

I am looking for more details on the point in processing cycle at which the 
state store in Samza is written to disk. I noted the following statement in 
Samza Stateful Processing section:

"Samza includes an additional in-memory caching layer in front of RocksDB, 
which avoids the cost of deserialization for frequently-accessed objects and 
batches writes. If the same key is updated multiple times in quick succession, 
the batching coalesces those updates into a single write. The writes are 
flushed to the changelog when a task commits."

Here, "commits" is actually a hyperlink and clicking on it takes me to 
checkpointing section, which I have read a lot of times from the pt of view of 
input stream offset commits.

Does this mean that all writes to the disk for state store purposes will be 
done at the checkpointing time (which is also the time Samza checkpoints the 
incoming stream offsets)? Does this also mean new data to the changelog stream 
will be emitted at checkpointing time?

Thanks,
Buvana

Reply via email to