Roger: That was an extremely clear and helpful talk, the described approach
fits my usecase nearly perfectly, thanks!

Yi: I'd love to give feedback, I'll add myself to the task.

On Mon, Feb 23, 2015 at 6:21 PM, Yi Pan <nickpa...@gmail.com> wrote:

> Hey, Geoffry,
>
> We have started some work in SAMZA-552 to create a window operator API in
> samza, as part of effort to implement support for a high-level language. I
> will probably be able to have something to share in a few days and would
> love to get feedbacks regarding to the window operator.
>
> Thanks!
>
> On Mon, Feb 23, 2015 at 4:49 PM, Geoffry Sumter <vit...@gmail.com> wrote:
>
> > Hey everyone,
> >
> > I've been thinking about reprocessing
> > <
> http://samza.apache.org/learn/documentation/0.7.0/jobs/reprocessing.html>
> > when
> > my job has windowed state
> > <
> >
> http://samza.apache.org/learn/documentation/0.7.0/container/state-management.html#windowed-aggregation
> > >
> > and
> > I have a few questions.
> >
> > Context: I have a series of physical sensors that stream partial scans of
> > their surroundings over the course of ~5-10 minutes (at the end of 5-10
> > minutes its provided a full scan of its surroundings and starts over from
> > the start). Each packet of data has a timestamp that we're just going to
> > have to trust is 'close enough.' When processing in real-time it's very
> > natural to window the data every 5 minutes (wall clock) and merge into a
> > complete view of their collective surroundings. For our purposes, old
> data
> > arriving > 5 minutes late is no longer useful for many applications.
> >
> > Now, I'd love to be able to reprocess data, especially by increasing
> > parallelism and processing quickly, but this seems incompatible with
> using
> > wall-clock-based windowed state. I could base my windowing/binning on the
> > timestamps provided by my input data, but then I have to be careful to
> > handle cases where some of my data may be arbitrarily delayed and the
> > possibility that one partition will get significantly ahead of other ones
> > (less interesting surroundings and faster computations) which makes
> waiting
> > for a majority of partitions to get to a certain point in time difficult.
> >
> > Does anyone have experience with windowing and reprocessing? Any
> literature
> > recommendations?
> >
> > Thanks!
> > Geoffry
> >
>

Reply via email to