Hey Ramanan, Confirmed. It all happens in commit (at the "checkpoint interval")
The operations are executed serially for each task. The order is exactly as Yi described. The order was chosen with crashes in mind. That is, the checkpoint is not written until state has been updated and output messages have been sent. Note, this could cause state and/or output messages to be rewritten in failure scenarios because after restart the task would resume from the previous checkpoint, reprocessing any incoming messages that had already been processed since that checkpoint. If you'd like to see for yourself, take a look at the commit() method here: https://github.com/apache/samza/blob/f02386464d31b5a496bb0578838f51a0331bfffa/samza-core/src/main/scala/org/apache/samza/container/TaskInstance.scala Jake On Wed, Jul 6, 2016 at 10:16 AM, Ramanan, Buvana (Nokia - US) < buvana.rama...@nokia-bell-labs.com> wrote: > Thank you Yi! Please also confirm that all this is done at the exact same > interval - that is checkpoint interval. > > Are the 3 operations you listed done in sequence or in parallel? If > sequential, then what is the order? Can you also comment on server failure > scenarios while this is being carried out - that is, a failure results in a > subset of these operations not completing, in which case the newly spawned > Samza container may have stale state. How likely is that, have you faced it? > > -----Original Message----- > From: Yi Pan [mailto:nickpa...@gmail.com] > Sent: Wednesday, July 06, 2016 1:10 PM > To: dev@samza.apache.org > Subject: Re: flushing changelog & checkpointing > > Hi, Buvana, > > Please see answers below. > > On Tue, Jul 5, 2016 at 11:47 AM, Ramanan, Buvana (Nokia - US) < > buvana.rama...@nokia-bell-labs.com> wrote: > > > > > Does this mean that all writes to the disk for state store purposes > > will be done at the checkpointing time (which is also the time Samza > > checkpoints the incoming stream offsets)? Does this also mean new data > > to the changelog stream will be emitted at checkpointing time? > > > > > Yes, the commit workflow in Samza guarantees that a) pending writes to > on-disk KV-stores are flushed; b) all pending writes to output streams are > flushed, including the writes to changelog streams; c) all input offsets > are flushed to checkpoint. > > Hope the above answers your questions. > > -Yi >