Re: question on commit on changelog

2015-08-25 Thread Yi Pan
Hi, Chen, The at-least-once semantics is always guaranteed by committing the offsets at the last step in the commit. Hence, flushing to local disk, changelog and output topics always need to succeed before the offsets are committed to checkpoint. If anything fails in-between, the offset will not b

Re: question on commit on changelog

2015-08-25 Thread Chen Song
Thanks Yi. Another follow up question. If 'checkpoint' and 'changelog' is managed separately (according to your explanation above), how do we support at least once processing? For example, what if the task commits offsets to checkpoint topic but the producer doesn't send all data in its buffer to

Re: question on commit on changelog

2015-08-04 Thread Yi Pan
Hi, Chen, So, is your goal to improve the throughput to the changelog topic or reduce the size of the changelog topic? If you are targeting for later and your KV-store truly is of the size of the input log, I don't see how it is possible. In a lot of use cases, users will only need to retain the *

Re: question on commit on changelog

2015-08-04 Thread Chen Song
Thanks Yan. Very good explanation on 1). For 2), I understand that users can tune the size of the batch for Kafka producer. However, that doesn't change the number of messages sent to the changelog topic. In our case, we process a high volume log (1.5MM records/second) will update kv store for e

Re: question on commit on changelog

2015-07-22 Thread Yan Fang
Hi Chen Song, There are two different concepts: *checkpoint* and *changelog*. Checkpoint is for the offset of the messages, while the changelog is for the kv-store. The code snippet you show is for the checkpoint , not for the changelog. {quote} 1. When implementing our Samza task, does each call

question on commit on changelog

2015-07-22 Thread Chen Song
We are trying to understand the order of commits when processing each message in a Samza job. T1: input offset commit T2: changelog commit T3: output commit By looking at the code snippet in https://github.com/apache/samza/blob/master/samza-core/src/main/scala/org/apache/samza/container/TaskInsta