Great, thanks all!
Regards
Sab
On Tue, Apr 19, 2016 at 5:00 AM, Yi Pan wrote:
> Hi, Sabarish Sasidharan,
>
> The key point is to make your KV-store update idempotent. So, if the offset
> associated with the aggregated value are written in the same row in RocksDB
> (i.e. atomicity is achieved he
Hi, Sabarish Sasidharan,
The key point is to make your KV-store update idempotent. So, if the offset
associated with the aggregated value are written in the same row in RocksDB
(i.e. atomicity is achieved here), I think that your approach would work.
As Robert mentioned, offsets are always committ
Looking at:
https://github.com/apache/samza/blob/f02386464d31b5a496bb0578838f51a0331bfffa/samza-core/src/main/scala/org/apache/samza/container/TaskInstance.scala#L171
The commit function, in order, does:
1. Flushes metrics
2. Flushes stores
3. Produces messages from the collectors
4. Write offset
Hi Guozhang
Thanks. Assuming the checkpoint would typically be behind the offset
persisted in my store (+ changelog), when the messages are replayed
starting from the checkpoint, I can very well skip those by comparing
against the offset in my store right? So I am not understanding why
duplicates
Hi Sab,
For stateful processing where you have persistent state stores, you need to
maintain the checkpoint which includes the committed offsets as well as the
store flushed in sync, but right not these two operations are not done
atomically, and hence if you fail in between, you could still get
d