Hi, Zach, We try to accommodate the low-watermark concept w/o depending on the embedded low-watermark messages in the streams. The dependency on low-watermark messages in the streams is a concern for us due to the following two reasons: 1) The injection of low-watermark messages depends on the producer, or the injection point to insert the low-watermark messages periodically. 2) The delivery of low-watermark messages could be delayed by the underlying messaging system as well, causing the stream processing be stalled if totally depending on the arrival of low-watermark to close the window.
So, we choose to adopt the concept of low-watermark being the indicator of the end of window in a certain stream w/o depending on the low-watermark message. The full-size policy in the design is used to determine the end of the window, in the absence of low-watermark messages. In short, yes, the window operator compute the low-watermark from timestamps on incoming events, based on the full-size policy. Love to hear your feedback/comments on this. Thanks! -Yi On Tue, Jan 19, 2016 at 9:17 AM, Zach Cox <zcox...@gmail.com> wrote: > Hi Yi, > > I read through the SAMZA-552 design [1] and have some questions: is the > low-watermark concept included in the Window Metadata? Does the window > operator compute the low-watermark from timestamps on incoming events? > > Thanks, > Zach > > [1] > > https://issues.apache.org/jira/secure/attachment/12708934/DESIGN-SAMZA-552-7.pdf > > > On Wed, Jan 13, 2016 at 12:23 AM Yi Pan <nickpa...@gmail.com> wrote: > > > Hi, Zach, > > > > Glad that you pointed it out! Actually, the design description in > SAMZA-552 > > has adopt a lot of flavors of high-watermark, late-arrivals, from > > MillWheel. The terms used in the design doc maybe different since the > terms > > in the doc were used earlier than we discovered the MillWheel > presentation. > > But in essense, the goal of SAMZA-552 (i.e. mainly, the windowing > technique > > described there) is targeted to implement those concepts of > > high-watermark/late-arrivals in Samza. > > > > We are planning to move forward in SAMZA-552 and are more than happy to > > discuss it in much more details if you are interested. > > > > Thanks! > > > > -Yi > > > > On Tue, Jan 12, 2016 at 3:08 PM, Zach Cox <zcox...@gmail.com> wrote: > > > > > I'm curious - has anyone built any Samza-based systems that use any > > notion > > > of stream progress, e.g. low watermarks, punctuations, or heartbeats? > > These > > > are described in the stream-processing literature [1] [2] [3] and > > > implemented in MillWheel [4] and Dataflow [5] but I have not seen any > > > mention of these techniques related to Samza (except for briefly in > > > Samza-552 [6]). > > > > > > The purpose of something like a low watermark would include handling > > > out-of-order events, outputting the result of a stateful operation > after > > > all relevant events have been processed, and cleaning up internal state > > > that will never again be updated to avoid unbounded growth. > > > > > > Just wondering if techniques like these would be useful in Samza job > > > pipelines, or if there are various approaches in Samza that make them > > > unnecessary. > > > > > > Thanks, > > > Zach > > > > > > [1] http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=1198390 > > > [2] http://dl.acm.org/citation.cfm?id=1055596 > > > [3] http://dl.acm.org/citation.cfm?id=1453890 > > > [4] http://research.google.com/pubs/pub41378.html > > > [5] http://research.google.com/pubs/pub43864.html > > > [6] https://issues.apache.org/jira/browse/SAMZA-552 > > > > > >