Re: Question on watermark

Ben Chambers Thu, 15 Jun 2017 14:51:27 -0700

Your understanding seems roughly correct. When the watermark is talked
about as a timestamp or "one dimensional" concept it is because we're
implicitly talking about the watermark *at the current processing time*. As
the current processing time moves forward, the value of the watermark
changes too.


There is also a requirement that the watermark only moves forward.

On Thu, Jun 15, 2017 at 2:39 PM Haibo Chen <haiboc...@cloudera.com> wrote:

> Hi all,
>
> While I was going over The Beam Model [model evolution]
> <https://docs.google.com/presentation/d/1SHie3nwe-pqmjGum_QDznPr-B_zXCjJ2VBDGdafZme8/edit#slide=id.g12846a6162_0_5>
>  to
> learn the basics of the model, I found the explanation of watermark (in
> slide 27), "No timestamp earlier than the watermark will be seen" and "It
> declares that no event times earlier than this point are expected to appear
> in the future", hard to understand.
>
> From the graph in the slide, the watermark seems to be a two-dimensional
> concept, whereas timestamp (regardless of event or processing time) is
> one-dimensional. Hence, my confusion around the explanation. It seems to me
> that we can only talk about watermark in the context of processing time.
> My own interpretation of water ,based on the graph, is
>
> Given a point (e, p)on the the watermark curve, at processing time p, the
> system is confident (since watermark is just heuristics) that no events
> happened earlier than e are expected to be seen.
>
> Is the understanding roughly correct? I plan to read the Dataflow paper to
> get a more precise understanding, but would also like to hear explanations
> in a less formal terms. Any help is greatly appreciated.
>
> Best,
> Haibo Chen
>

Re: Question on watermark

Reply via email to