Exactly once processing

2016-04-15 Thread Sabarish Sasidharan
Hi To achieve exactly once processing for my aggregates, wouldn’t it be enough if I maintain the latest offset processed for the aggregate and check against that offset when messages are replayed on recovery? Am I missing something here? Thanks Regards Sab

Exactly once processing

2016-04-15 Thread Sasidharan, Sabarish
Hi To achieve exactly once processing for my aggregates, wouldn’t it be enough if I maintain the latest offset processed for the aggregate and check against that offset when messages are replayed on recovery? Am I missing something here? Thanks Regards Sab

Re: Exactly once processing

2016-04-15 Thread Guozhang Wang
Hi Sab, For stateful processing where you have persistent state stores, you need to maintain the checkpoint which includes the committed offsets as well as the store flushed in sync, but right not these two operations are not done atomically, and hence if you fail in between, you could still get d

Re: Exactly once processing

2016-04-15 Thread Sabarish Sasidharan
Hi Guozhang Thanks. Assuming the checkpoint would typically be behind the offset persisted in my store (+ changelog), when the messages are replayed starting from the checkpoint, I can very well skip those by comparing against the offset in my store right? So I am not understanding why duplicates

Re: Exactly once processing

2016-04-15 Thread Robert Crim
Looking at: https://github.com/apache/samza/blob/f02386464d31b5a496bb0578838f51a0331bfffa/samza-core/src/main/scala/org/apache/samza/container/TaskInstance.scala#L171 The commit function, in order, does: 1. Flushes metrics 2. Flushes stores 3. Produces messages from the collectors 4. Write offset

Review Request 46282: SAMZA-928 document Kerberos on YARN

2016-04-15 Thread Chen Song
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/46282/ --- Review request for samza. Repository: samza Description --- SAMZA-928 do

Re: Review Request 46282: SAMZA-928 document Kerberos on YARN

2016-04-15 Thread Chen Song
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/46282/ --- (Updated April 15, 2016, 10:09 p.m.) Review request for samza. Repository: sa

Review Request 46287: Add a double serde.

2016-04-15 Thread Jon Bringhurst
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/46287/ --- Review request for samza. Bugs: SAMZA-936 https://issues.apache.org/jira/br

Re: Review Request 46287: Add a double serde.

2016-04-15 Thread Jake Maes
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/46287/#review129210 --- Ship it! Ship It! - Jake Maes On April 15, 2016, 11:17 p.m.,

Review Request 46296: SAMZA-932: JMX port collisions in JmxServer

2016-04-15 Thread Tao Feng
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/46296/ --- Review request for samza. Repository: samza Description --- SAMZA-932: J