Hey Jae,

If I understand you correctly, your concern is that there could be flushes
in-between commits. For example:

  T=30s; flush
  T=45s; flush
  T=60s; flush && commit
  T=65s; flush

Your concern here is that if there's a failure before 60s, the messages
that were flushed at 30s and 45s will be duplicated when the container
reprocesses, right?

> Never mind. I found a solution. Flush should be synced with commit.

Could you elaborate on this?

Cheers,
Chris

On Thu, Jan 29, 2015 at 12:27 AM, Bae, Jae Hyeon <metac...@gmail.com> wrote:

> Never mind. I found a solution. Flush should be synced with commit.
>
> On Thu, Jan 29, 2015 at 12:15 AM, Bae, Jae Hyeon <metac...@gmail.com>
> wrote:
>
> > Hi Samza Devs
> >
> > StreamTask can control SamzaContainer.commit() through task coordinator.
> > Can we make SystemProducer control commit after flush? With this feature,
> > we can prevent any duplicate data on SamzaContainer failure.
> >
> > For example, if we set commit interval as 2 minutes, before commit time
> > interval expires, when its buffer size is greater than batch size,
> > SystemProducer will flush data in the buffer. Right after flush, when the
> > container dies, another container will start from the previous commit.
> > Then, we will have duplicate data.
> >
> > If we have longer commit interval, we will have more duplicate data. I
> > know this is not a big deal because container failure will be rare case
> and
> > just a few minutes data will be duplicated. But I will be happy if we can
> > clear this little concern.
> >
> > Any idea?
> >
> > Thank you
> > Best, Jae
> >
>

Reply via email to