Re: assignTimestampsAndWatermarks not work after KeyedStream.process

2019-05-09 Thread Fabian Hueske
Hi, Please find my response below. Am Fr., 3. Mai 2019 um 16:16 Uhr schrieb an0 : > Thanks, but it does't seem covering this rule: > --- Quote > Watermarks are generated at, or directly after, source functions. Each > parallel subtask of a source function usually generates its watermarks > indep

Re: assignTimestampsAndWatermarks not work after KeyedStream.process

2019-05-03 Thread an0
Thanks, but it does't seem covering this rule: --- Quote Watermarks are generated at, or directly after, source functions. Each parallel subtask of a source function usually generates its watermarks independently. These watermarks define the event time at that particular parallel source. As the

Re: assignTimestampsAndWatermarks not work after KeyedStream.process

2019-05-03 Thread Fabian Hueske
Hi, this should be covered here: https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/event_time.html#watermarks-in-parallel-streams Best, Fabian Am Do., 2. Mai 2019 um 17:48 Uhr schrieb an0 : > This explanation is exactly what I'm looking for, thanks! Is such an > important rule doc

Re: assignTimestampsAndWatermarks not work after KeyedStream.process

2019-05-02 Thread an0
This explanation is exactly what I'm looking for, thanks! Is such an important rule documented anywhere in the official document? On 2019/04/30 08:47:29, Fabian Hueske wrote: > An operator task broadcasts its current watermark to all downstream tasks > that might receive its records. > If you h

Re: assignTimestampsAndWatermarks not work after KeyedStream.process

2019-04-30 Thread Fabian Hueske
An operator task broadcasts its current watermark to all downstream tasks that might receive its records. If you have an the following code: DataStream a = ... a.map(A).map(B).keyBy().window(C) and execute this with parallelism 2, your plan looks like this A.1 -- B.1 --\--/-- C.1

Re: assignTimestampsAndWatermarks not work after KeyedStream.process

2019-04-29 Thread M Singh
Hi An0: Here is my understanding - each operator has the watermark which is the lowest of all it's input streams. When the watermark for an operator is updated, the lowest one becomes the new watermark for that operator and is fowarded to the output streams for that operator.  So, if one of the

Re: assignTimestampsAndWatermarks not work after KeyedStream.process

2019-04-29 Thread an0
Thanks very much. It definitely explains the problem I'm seeing. However, something I need to confirm: You say "Watermarks are broadcasted/forwarded anyway." Do you mean, in assingTimestampsAndWatermarks.keyBy.window, it doesn't matter what data flows through a specific key's stream, all key str

Re: assignTimestampsAndWatermarks not work after KeyedStream.process

2019-04-25 Thread Dawid Wysakowicz
Hi, Watermarks are meta events that travel independently of data events. 1) If you assingTimestampsAndWatermarks before keyBy, all parallel instances of trips have some data(this is my assumption) so Watermarks can be generated. Afterwards even if some of the keyed partitions have no data, Waterm

Re: assignTimestampsAndWatermarks not work after KeyedStream.process

2019-04-25 Thread an0
If my understanding is correct, then why `assignTimestampsAndWatermarks` before `keyBy` works? The `timeWindowAll` stream's input streams are task 1 and task 2, with task 2 idling, no matter whether `assignTimestampsAndWatermarks` is before or after `keyBy`, because whether task 2 receives eleme

Re: assignTimestampsAndWatermarks not work after KeyedStream.process

2019-04-25 Thread Dawid Wysakowicz
Hi, Yes I think your explanation is correct. I can also recommend Seth's webinar where he talks about debugging Watermarks[1] Best, Dawid [1] https://www.ververica.com/resources/webinar/webinar/debugging-flink-tutorial On 22/04/2019 22:55, an0 wrote: > Thanks, I feel I'm getting closer to the

Re: assignTimestampsAndWatermarks not work after KeyedStream.process

2019-04-22 Thread an0
Thanks, I feel I'm getting closer to the truth. So parallelism is the cause? Say my parallelism is 2. Does that mean I get 2 tasks running after `keyBy` if even all elements have the same key so go to 1 down stream(say task 1)? And it is the other task(task 2) with no incoming data that caused

Re: assignTimestampsAndWatermarks not work after KeyedStream.process

2019-04-21 Thread Guowei Ma
HI, BoundedOutOfOrdernessTimestampExtractors can send a WM at least after it receives an element. For after Keyby: Flink uses the HashCode of key and the parallelism of down stream to decide which subtask would receive the element. This means if your key is always same, all the sources will only

Re: assignTimestampsAndWatermarks not work after KeyedStream.process

2019-04-19 Thread an0
Hi, First of all, thank you for the `shuffle()` tip. It works. However, I still don't understand why it doesn't work without calling `shuffle()`. Why would not all BoundedOutOfOrdernessTimestampExtractors receive trips? All the trips has keys and timestamps. As I said in my reply to Paul, I se

Re: assignTimestampsAndWatermarks not work after KeyedStream.process

2019-04-18 Thread Guowei Ma
Hi, After keyby maybe only some of BoundedOutOfOrdernessTimestampExtractors could receive the elements(trip). If that is the case BoundedOutOfOrdernessTimestampExtractor, which does not receive element would not send the WM. Since that the timeWindowAll operator could not be triggered. You could ad

Re: assignTimestampsAndWatermarks not work after KeyedStream.process

2019-04-18 Thread an00na
I don't think it is the watermark. I see the same watermarks from the two versions of code. The processing on the keyed stream doesn't change event time at all. I can simply change my code to use `map` on the keyed stream to return back the input data, so that the window operator receives the e

Re: assignTimestampsAndWatermarks not work after KeyedStream.process

2019-04-17 Thread Paul Lam
Hi, Could you check the watermark of the window operator? One possible situation would be some of the keys are not getting enough inputs, so their watermarks remain below the window end time and hold the window operator watermark back. IMO, it’s a good practice to assign watermark earlier in th