I see. But yes, even in the case the watermark will always be "one behind". The logic in the extraction operator is roughly this:
1. Extract timestamp T, assign to internal StreamRecord 2. Send StreamRecord downstream 3. Extract Watermark W 4. Send Watermark downstream (In your case T == W) The reason is that a watermark T says that there will not be an element with a timestamp <= T in the future. If the watermark were sent before the record then this would violate the watermark contract, i.e. your element with timestamp T would arrive after the watermark W. I think it's not easily possible to have a properly defined watermark for the very first element in a stream, unfortunately. Best, Aljoscha > On 4. Aug 2017, at 16:43, Gwenhael Pasquiers > <gwenhael.pasqui...@ericsson.com> wrote: > > We're using a AssignerWithPunctuatedWatermarks that extracts a timestamp from > the data. It keeps and returns the higher timestamp it has ever seen and > returns a new Watermark everytime the value grows. > > I know it's bad for performances, but for the moment it's not the issue, i > want the most possibly up-to-date watermark. > > -----Original Message----- > From: Aljoscha Krettek [mailto:aljos...@apache.org] > Sent: vendredi 4 août 2017 12:22 > To: Gwenhael Pasquiers <gwenhael.pasqui...@ericsson.com> > Cc: Nico Kruber <n...@data-artisans.com>; user@flink.apache.org > Subject: Re: Event-time and first watermark > > Hi, > > How are you defining the watermark, i.e. what kind of watermark extractor are > you using? > > Best, > Aljoscha > >> On 3. Aug 2017, at 17:45, Gwenhael Pasquiers >> <gwenhael.pasqui...@ericsson.com> wrote: >> >> We're not using a Window but a more basic ProcessFunction to handle >> sessions. We made this choice because we have to handle (millions of) >> sessions that can last from 10 seconds to 24 hours so we wanted to handle >> things manually using the State class. >> >> We're using the watermark as an event-time "clock" to: >> * compute "lateness" of a message relatively to the watermark (most >> recent message from the stream) >> * fire timer events >> >> We're using event-time instead of processing time because our stream will be >> late and data arrive by hourly bursts. >> >> Maybe we're misusing the watermark ? >> >> B.R. >> >> -----Original Message----- >> From: Nico Kruber [mailto:n...@data-artisans.com] >> Sent: jeudi 3 août 2017 16:30 >> To: user@flink.apache.org >> Cc: Gwenhael Pasquiers <gwenhael.pasqui...@ericsson.com> >> Subject: Re: Event-time and first watermark >> >> Hi Gwenhael, >> "A Watermark(t) declares that event time has reached time t in that >> stream, meaning that there should be no more elements from the stream >> with a timestamp t’ <= t (i.e. events with timestamps older or equal >> to the watermark)." [1] >> >> Therefore, they should be behind the actual event with timestamp t. >> >> What is it that you want to achieve in the end? What do you want to use the >> watermark for? They are basically a means to defining when an event time >> window ends. >> >> >> Nico >> >> [1] https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/ >> event_time.html#event-time-and-watermarks >> >> On Thursday, 3 August 2017 10:24:35 CEST Gwenhael Pasquiers wrote: >>> Hi, >>> >>> From my tests it seems that the initial watermark value is >>> Long.MIN_VALUE even though my first data passed through the timestamp >>> extractor before arriving into my ProcessFunction. It looks like the >>> watermark "lags" behind the data by one message. >>> >>> Is there a way to have a watermark more "up to date" ? Or is the only >>> way to compute it myself into my ProcessFunction ? >>> >>> Thanks. >> >