I see. But yes, even in the case the watermark will always be "one behind". The 
logic in the extraction operator is roughly this:

 1. Extract timestamp T, assign to internal StreamRecord
 2. Send StreamRecord downstream
 3. Extract Watermark W
 4. Send Watermark downstream

(In your case T == W)

The reason is that a watermark T says that there will not be an element with a 
timestamp <= T in the future. If the watermark were sent before the record then 
this would violate the watermark contract, i.e. your element with timestamp T 
would arrive after the watermark W. I think it's not easily possible to have a 
properly defined watermark for the very first element in a stream, 
unfortunately.

Best,
Aljoscha 
> On 4. Aug 2017, at 16:43, Gwenhael Pasquiers 
> <gwenhael.pasqui...@ericsson.com> wrote:
> 
> We're using a AssignerWithPunctuatedWatermarks that extracts a timestamp from 
> the data. It keeps and returns the higher timestamp it has ever seen and 
> returns a new Watermark everytime the value grows.
> 
> I know it's bad for performances, but for the moment it's not the issue, i 
> want the most possibly up-to-date watermark.
> 
> -----Original Message-----
> From: Aljoscha Krettek [mailto:aljos...@apache.org] 
> Sent: vendredi 4 août 2017 12:22
> To: Gwenhael Pasquiers <gwenhael.pasqui...@ericsson.com>
> Cc: Nico Kruber <n...@data-artisans.com>; user@flink.apache.org
> Subject: Re: Event-time and first watermark
> 
> Hi,
> 
> How are you defining the watermark, i.e. what kind of watermark extractor are 
> you using?
> 
> Best,
> Aljoscha
> 
>> On 3. Aug 2017, at 17:45, Gwenhael Pasquiers 
>> <gwenhael.pasqui...@ericsson.com> wrote:
>> 
>> We're not using a Window but a more basic ProcessFunction to handle 
>> sessions. We made this choice because we have to handle (millions of) 
>> sessions that can last from 10 seconds to 24 hours so we wanted to handle 
>> things manually using the State class.
>> 
>> We're using the watermark as an event-time "clock" to:
>> * compute "lateness" of a message relatively to the watermark (most 
>> recent message from the stream)
>> * fire timer events
>> 
>> We're using event-time instead of processing time because our stream will be 
>> late and data arrive by hourly bursts.
>> 
>> Maybe we're misusing the watermark ?
>> 
>> B.R.
>> 
>> -----Original Message-----
>> From: Nico Kruber [mailto:n...@data-artisans.com]
>> Sent: jeudi 3 août 2017 16:30
>> To: user@flink.apache.org
>> Cc: Gwenhael Pasquiers <gwenhael.pasqui...@ericsson.com>
>> Subject: Re: Event-time and first watermark
>> 
>> Hi Gwenhael,
>> "A Watermark(t) declares that event time has reached time t in that 
>> stream, meaning that there should be no more elements from the stream 
>> with a timestamp t’ <= t (i.e. events with timestamps older or equal 
>> to the watermark)." [1]
>> 
>> Therefore, they should be behind the actual event with timestamp t.
>> 
>> What is it that you want to achieve in the end? What do you want to use the 
>> watermark for? They are basically a means to defining when an event time 
>> window ends.
>> 
>> 
>> Nico
>> 
>> [1] https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/
>> event_time.html#event-time-and-watermarks
>> 
>> On Thursday, 3 August 2017 10:24:35 CEST Gwenhael Pasquiers wrote:
>>> Hi,
>>> 
>>> From my tests it seems that the initial watermark value is 
>>> Long.MIN_VALUE even though my first data passed through the timestamp 
>>> extractor before arriving into my ProcessFunction. It looks like the 
>>> watermark "lags" behind the data by one message.
>>> 
>>> Is there a way to have a watermark more "up to date" ? Or is the only 
>>> way to compute it myself into my ProcessFunction ?
>>> 
>>> Thanks.
>> 
> 

Reply via email to