Thank you Yi! Please also confirm that all this is done at the exact same interval - that is checkpoint interval.
Are the 3 operations you listed done in sequence or in parallel? If sequential, then what is the order? Can you also comment on server failure scenarios while this is being carried out - that is, a failure results in a subset of these operations not completing, in which case the newly spawned Samza container may have stale state. How likely is that, have you faced it? -----Original Message----- From: Yi Pan [mailto:[email protected]] Sent: Wednesday, July 06, 2016 1:10 PM To: [email protected] Subject: Re: flushing changelog & checkpointing Hi, Buvana, Please see answers below. On Tue, Jul 5, 2016 at 11:47 AM, Ramanan, Buvana (Nokia - US) < [email protected]> wrote: > > Does this mean that all writes to the disk for state store purposes > will be done at the checkpointing time (which is also the time Samza > checkpoints the incoming stream offsets)? Does this also mean new data > to the changelog stream will be emitted at checkpointing time? > > Yes, the commit workflow in Samza guarantees that a) pending writes to on-disk KV-stores are flushed; b) all pending writes to output streams are flushed, including the writes to changelog streams; c) all input offsets are flushed to checkpoint. Hope the above answers your questions. -Yi
