Re: Watermarks

2023-10-02 Thread Perez
As per this link, it says that it only supports value_only for now as I am using pyflink. Does it mean I can't extract the timestamp appended by Kafka with pyflink as of now? https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/connectors/datastream/kafka/#deserializer or does it mean s

Re: Watermarks

2023-10-02 Thread Perez
Hi Liu and Jinfeng, I am trying to implement KafkaDeserializationSchema for Pyflink but am unable to get any examples. Can you share some links or references using which I can understand and try to implement myself? Perez sid.

Re: Watermarks

2023-09-13 Thread Perez
Cool thanks for the clarification. Sid. On Mon, Sep 11, 2023 at 9:22 AM liu ron wrote: > Hi, Sid > > For the second question, I think it is not needed. > > Best, > Ron > > Feng Jin 于2023年9月9日周六 21:19写道: > >> hi Sid >> >> >> 1. You can customize KafkaDeserializationSchema[1], in the `deserializ

Re: Watermarks

2023-09-10 Thread liu ron
Hi, Sid For the second question, I think it is not needed. Best, Ron Feng Jin 于2023年9月9日周六 21:19写道: > hi Sid > > > 1. You can customize KafkaDeserializationSchema[1], in the `deserialize` > method, you can obtain the Kafka event time. > > 2. I don't think it's necessary to explicitly mention

Re: Watermarks

2023-09-09 Thread Feng Jin
hi Sid 1. You can customize KafkaDeserializationSchema[1], in the `deserialize` method, you can obtain the Kafka event time. 2. I don't think it's necessary to explicitly mention the watermark strategy. [1]. https://nightlies.apache.org/flink/flink-docs-master/api/java/org/apache/flink/connec

Re: Watermarks lagging behind events that generate them

2023-03-16 Thread David Anderson
Watermarks are not included in checkpoints or savepoints. See [1] for some head-scratchingly-complicated info about restarts, watermarks, and unaligned checkpoints. [1] https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/ops/state/checkpointing_under_backpressure/#interplay-with-wate

Re: Watermarks lagging behind events that generate them

2023-03-15 Thread Shammon FY
Hi Alexis Currently I think checkpoint and savepoint will not save watermarks. I think how to deal with watermarks at checkpoint/savepoint is a good question, we can discuss this in dev mail list Best, Shammon FY On Wed, Mar 15, 2023 at 4:22 PM Alexis Sarda-Espinosa < sarda.espin...@gmail.com>

Re: Watermarks lagging behind events that generate them

2023-03-15 Thread Alexis Sarda-Espinosa
Hi Shammon, thanks for the info. I was hoping the savepoint would include the watermark, but I'm not sure that would make sense in every scenario. Regards, Alexis. Am Di., 14. März 2023 um 12:59 Uhr schrieb Shammon FY : > Hi Alexis > > In some watermark generators such as BoundedOutOfOrderTimest

Re: Watermarks lagging behind events that generate them

2023-03-14 Thread Shammon FY
Hi Alexis In some watermark generators such as BoundedOutOfOrderTimestamps, the timestamp of watermark will be reset to Long.MIN_VALUE if the subtask is restarted and no event from source is processed. Best, Shammon FY On Tue, Mar 14, 2023 at 4:58 PM Alexis Sarda-Espinosa < sarda.espin...@gmail.

Re: Watermarks lagging behind events that generate them

2023-03-14 Thread Alexis Sarda-Espinosa
Hi David, thanks for the answer. One follow-up question: will the watermark be reset to Long.MIN_VALUE every time I restart a job with savepoint? Am Di., 14. März 2023 um 05:08 Uhr schrieb David Anderson < dander...@apache.org>: > Watermarks always follow the corresponding event(s). I'm not sure

Re: Watermarks lagging behind events that generate them

2023-03-13 Thread David Anderson
Watermarks always follow the corresponding event(s). I'm not sure why they were designed that way, but that is how they are implemented. Windows maintain this contract by emitting all of their results before forwarding the watermark that triggered the results. David On Mon, Mar 13, 2023 at 5:28 P

Re: Watermarks lagging behind events that generate them

2023-03-13 Thread Shammon FY
Hi Alexis Do you use both event-time watermark generator and TimerService for processing time in your job? Maybe you can try using event-time watermark first. Best, Shammon.FY On Sat, Mar 11, 2023 at 7:47 AM Alexis Sarda-Espinosa < sarda.espin...@gmail.com> wrote: > Hello, > > I recently ran in

Re: Watermarks event time vs processing time

2022-03-29 Thread HG
atermark strategy > - and replaces it with something that is arbitrary (at this point > it is hard to guess the correct max lateness that is a mixture of the > events from multiple Kafka partitions) > > > > Concusion: > > The only way to make the event tim

RE: Watermarks event time vs processing time

2022-03-29 Thread Schwalbe Matthias
manner. Hope this helps Thias From: HG Sent: Tuesday, March 29, 2022 1:07 PM To: Schwalbe Matthias Cc: user Subject: Re: Watermarks event time vs processing time ⚠EXTERNAL MESSAGE – CAUTION: Think Before You Click ⚠ Hello Matthias, When I remove all the watermark strategies it does not

Re: Watermarks event time vs processing time

2022-03-29 Thread HG
Hello Matthias, When I remove all the watermark strategies it does not process anything . For example when I use WatermarkStrategy.noWatermarks() instead of the one I build nothing seems to happen at all. Also when I skip the part where I add wmStrategy to create tuple4dswm: DataStream> tuple4

RE: Watermarks event time vs processing time

2022-03-16 Thread Schwalbe Matthias
ing of the watermarks on single operators / per subtask useful: Look for subtasks that don’t have watermarks, or too low watermarks for a specific session window to trigger. Thias From: HG Sent: Mittwoch, 16. März 2022 16:41 To: Schwalbe Matthias Cc: user Subject: Re: Watermarks event time

Re: Watermarks event time vs processing time

2022-03-16 Thread HG
Hi Matthias and others Thanks for the answer. I will remove the Idleness. However I am not doing max/min etc. Unfortunately most examples are about aggregations. The inputs are like this {"handling_time":1646992800260,"transaction_id":"017f6af1548e-119dfb",} {"handling_time":164699280

RE: Watermarks event time vs processing time

2022-03-16 Thread Schwalbe Matthias
Hi Hanspeter, Let me relate some hints that might help you getting concepts clearer. From your description I make following assumptions where your are not specific enough (please confirm or correct in your answer): 1. You store incoming events in state per transaction_id to be sorted/aggreg

Re: Watermarks in Event Time Temporal Join

2021-04-28 Thread Leonard Xu
Thanks for your example, Maciej I can explain more about the design. > Let's have events. > S1, id1, v1, 1 > S2, id1, v2, 1 > > Nothing is happening as none of the streams have reached the watermark. > Now let's add > S2, id2, v2, 101 > This should trigger join for id1 because we have all the kno

Re: Watermarks in Event Time Temporal Join

2021-04-28 Thread Maciej Bryński
Hi Leonard, Let's assume we have two streams. S1 - id, value1, ts1 with watermark = ts1 - 1 S2 - id, value2, ts2 with watermark = ts2 - 1 Then we have following interval join SELECT id, value1, value2, ts1 FROM S1 JOIN S2 ON S1.id = S2.id and ts1 between ts2 - 1 and ts2 Let's have events. stream,

Re: Watermarks in Event Time Temporal Join

2021-04-26 Thread Leonard Xu
Hello, Maciej > I agree the watermark should pass on versioned table side, because > this is the only way to know which version of record should be used. > But if we mimics behaviour of interval join then main stream watermark > could be skipped. IIRC, rowtime interval join requires the watermark

Re: Watermarks in Event Time Temporal Join

2021-04-25 Thread Maciej Bryński
Hi Shengkai, Thanks for the answer. The question is do we need to determine if an event in the main stream is late. Let's look at interval join - event is emitted as soon as there is a match between left and right stream. I agree the watermark should pass on versioned table side, because this is th

Re: Watermarks in Event Time Temporal Join

2021-04-25 Thread Shengkai Fang
Hi, maverick. The watermark is used to determine the message is late or early. If we only use the watermark on versioned table side, we have no means to determine whether the event in the main stream is ready to emit. Best, Shengkai maverick 于2021年4月26日周一 上午2:31写道: > Hi, > I'm curious why Even

Re: Watermarks on map operator

2021-02-05 Thread David Anderson
Basically the only thing that Watermarks do is to trigger event time timers. Event time timers are used explicitly in KeyedProcessFunctions, but are also used internally by time windows, CEP (to sort the event stream), in various time-based join operations, and within the Table/SQL API. If you wan

Re: Watermarks on map operator

2021-02-04 Thread Kezhu Wang
> it is not clear to me if watermarks are also used by map/flatmpat operators or just by window operators. Watermarks are most liked only used by timing segmented aggregation operator to trigger result materialization. In streaming, this “timing segmentation” is usually called “windowing”, so in t

Re: Watermarks and parallelism

2020-05-19 Thread Arvid Heise
asset 5 data not be dropped >considering it as a late date? > > > > Regards, > > Gnana > > > > *From: *Arvid Heise > *Date: *Monday, 18 May 2020 at 4:59 PM > *To: *Gnanasoundari Soundarajan > *Cc: *Alexander Fedulov , "user@flink.apache.org" >

Re: Watermarks and parallelism

2020-05-19 Thread Gnanasoundari Soundarajan
, 18 May 2020 at 4:59 PM To: Gnanasoundari Soundarajan Cc: Alexander Fedulov , "user@flink.apache.org" Subject: Re: Watermarks and parallelism Hi Gnanasoundari, Your use case is very typical and pretty much the main motivation for event time and watermarks. It's supported ou

Re: Watermarks and parallelism

2020-05-18 Thread Arvid Heise
and hence data didn’t flow downstream. > > > > I am curious to know that whether will it be a feasible requirement to > achieve it in flink using event time? > > > > Regards, > > Gnana > > > > *From: *Alexander Fedulov > *Date: *Thursday, 14 May 2020 at

Re: Watermarks and parallelism

2020-05-15 Thread Gnanasoundari Soundarajan
data didn’t flow downstream. I am curious to know that whether will it be a feasible requirement to achieve it in flink using event time? Regards, Gnana From: Alexander Fedulov Date: Thursday, 14 May 2020 at 9:25 PM To: Gnanasoundari Soundarajan Cc: "user@flink.apache.org" S

Re: Watermarks and parallelism

2020-05-14 Thread Alexander Fedulov
Hi Gnana, 1. No, watermarks are generated independently per subtask. I think this section of the docs might make things more clear - [1] . 2. The same watermark from the

Re: Watermarks and Kafka

2019-07-08 Thread Juan Gentile
the watermarks handled in the source operator. Please let us know your opinion. Thank you, Juan G. From: Konstantin Knauf Date: Sunday, July 7, 2019 at 10:14 PM To: Juan Gentile Cc: "user@flink.apache.org" , Olivier Solliec , Oleksandr Nitavskyi Subject: Re: Watermarks and Kafka H

Re: Watermarks and Kafka

2019-07-07 Thread Konstantin Knauf
Hi Juan, I just replied to your other question, but I think, I better get where you are coming from now. Are you aware of per-partition watermarking [1]? You don't need to manage this map yourself. BUT: this does not solve the problem, that this Map is not stored in Managed State. Watermarks are

Re: Watermarks in Event Time windowing

2018-09-14 Thread David Anderson
To clarify one thing: keep in mind that Flink does not support per-key watermarks. Watermarks are typically generated per-source, or in the case of kafka, can be per-partition. An idle source (or in the case of kafka, an idle partition) can prevent an event-time window from being triggered, but you

Re: Watermarks in Event Time windowing

2018-09-13 Thread Taher Koitawala
Yes in many cases what we have faced that let's say in a keyed stream an element of a specific key comes in which triggers a new window. If a corresponding elements of the same key does not arrive a new watermark is not generated for the window to purge. Then we faced issues with flink keeping reco

Re: Watermarks in Event Time windowing

2018-09-13 Thread vino yang
Hi Taher, For some questions, I suggest you read the documentation related to Flink EventTime first, for example [1] About this question: What happens if the watermark is same as the timestamp? Here "timestamp", do you mean the current timestamp of Processing time? If that's the best, it's an id

Re: Watermarks per key

2017-02-21 Thread Aljoscha Krettek
Hi, managing a per-key watermark would require keeping to current watermark for each key, for example at the sources or in a timestamp/watermark assigner. The problem then is figuring out when you can discard that state because it would otherwise grow indefinitely if you have an evolving key space.

Re: Watermarks per key

2017-02-20 Thread jganoff
There's nothing stopping me assigning timestamps and generating watermarks on a keyed stream in the implementation and the KeyedStream API supports it. It appears the underlying operator that gets created in DataStream.assignTimestampsAndWatermarks() isn't key-aware and globally tracks timestamps.

Re: Watermarks per key

2017-02-15 Thread Fabian Hueske
Hi Jordan, it is not possible to generate watermarks per key. This feature has been requested a couple of times but I think there are no plans to implement that. As far as I understand, the management of watermarks would be quite expensive (maintaining several watermarks, purging watermarks of exp

Re: Watermarks and window firing

2016-10-26 Thread Robert Metzger
Just for others who are wondering what this email is about: I suspect that this email was send accidentally and that this is the correct one: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Event-time-watermarks-and-windows-td9687.html On Mon, Oct 24, 2016 at 4:56 PM, Paul Joir

Re: Watermarks with repartition

2016-02-26 Thread Aljoscha Krettek
Yes, that would be perfect. Thanks! -- Aljoscha > On 26 Feb 2016, at 20:53, Zach Cox wrote: > > Sure, want me to open a jira issue and then PR a new page into > https://github.com/apache/flink/tree/master/docs/internals, following these > instructions? http://flink.apache.org/contribute-docume

Re: Watermarks with repartition

2016-02-26 Thread Zach Cox
Sure, want me to open a jira issue and then PR a new page into https://github.com/apache/flink/tree/master/docs/internals, following these instructions? http://flink.apache.org/contribute-documentation.html -Zach On Fri, Feb 26, 2016 at 1:13 PM Aljoscha Krettek wrote: > Cool, that’s a nice wri

Re: Watermarks with repartition

2016-02-26 Thread Aljoscha Krettek
Cool, that’s a nice write up. Would you maybe be interested in integrating this as some sort of internal documentation in Flink? So that prospective contributors can get to know this stuff. Cheers, Aljoscha > On 26 Feb 2016, at 18:32, Zach Cox wrote: > > Thanks for the confirmation Aljoscha! I

Re: Watermarks with repartition

2016-02-26 Thread Zach Cox
Thanks for the confirmation Aljoscha! I wrote up results from my little experiment: https://github.com/zcox/flink-repartition-watermark-example -Zach On Fri, Feb 26, 2016 at 2:58 AM Aljoscha Krettek wrote: > Hi, > yes, your description is spot on! > > Cheers, > Aljoscha > > On 26 Feb 2016, at

Re: Watermarks with repartition

2016-02-26 Thread Aljoscha Krettek
Hi, yes, your description is spot on! Cheers, Aljoscha > On 26 Feb 2016, at 00:19, Zach Cox wrote: > > I think I found the information I was looking for: > > RecordWriter broadcasts each emitted watermark to all outgoing channels [1]. > > StreamInputProcessor tracks the max watermark received

Re: Watermarks with repartition

2016-02-25 Thread Zach Cox
I think I found the information I was looking for: RecordWriter broadcasts each emitted watermark to all outgoing channels [1]. StreamInputProcessor tracks the max watermark received on each incoming channel separately, and computes the task's watermark as the min of all incoming watermarks [2].

Re: Watermarks as "process completion" flags

2015-11-30 Thread Stephan Ewen
You cannot force a barrier at one point in time. At what time checkpoints are triggered is decided by the master node. I think in your case you can use the checkpoint and notification calls to figure out when data has flown through the DAG, but you cannot force a barrier at a specific point. On M

Re: Watermarks as "process completion" flags

2015-11-30 Thread Anton Polyakov
Hi Stephan sorry for misunderstanding, but how do I make sure barrier is placed at the proper time? How does my source "force" checkpoint to start happening once it finds that all needed elements are now produced? On Mon, Nov 30, 2015 at 2:13 PM, Stephan Ewen wrote: > Hi! > > If you implement t

Re: Watermarks as "process completion" flags

2015-11-30 Thread Stephan Ewen
Hi! If you implement the "Checkpointed" interface, you get the function calls to "snapshotState()" at the point when the checkpoint barrier arrives at an operator. So, the call to "snapshotState()" in the sink is when the barrier reaches the sink. The call to "checkpointComplete()" in the sources

Re: Watermarks as "process completion" flags

2015-11-30 Thread Anton Polyakov
Hi Stephan thanks that looks super. But source needs then to emit checkpoint. At the source, while reading source events I can find out that - this is the source event I want to take actions after. So if at ssource I can then emit checkpoint and catch it at the end of the DAG that would solve my p

Re: Watermarks as "process completion" flags

2015-11-30 Thread Anton Polyakov
Hi Stephan thanks that looks super. But source needs then to emit checkpoint. At the source, while reading source events I can find out that - this is the source event I want to take actions after. So if at ssource I can then emit checkpoint and catch it at the end of the DAG that would solve my p

Re: Watermarks as "process completion" flags

2015-11-30 Thread Stephan Ewen
Hi Anton! That you can do! You can look at the interfaces "Checkpointed" and "checkpointNotifier". There you will get a call at every checkpoint (and can look at what records are before that checkpoint). You also get a call once the checkpoint is complete, which corresponds to the point when ever

Re: Watermarks as "process completion" flags

2015-11-30 Thread Anton Polyakov
I think I can turn my problem into a simpler one. Effectively what I need - I need way to checkpoint certain events in input stream and once this checkpoint reaches end of DAG take some action. So I need a signal at the sink which can tell "all events in source before checkpointed event are now pr

Re: Watermarks as "process completion" flags

2015-11-29 Thread Anton Polyakov
Hi Fabian Defining a special flag for record seems like a checkpoint barrier. I think I will end up re-implementing checkpointing myself. I found the discussion in flink-dev: mail-archives.apache.org/mod_mbox/flink-dev/201511.mbox/…

Re: Watermarks as "process completion" flags

2015-11-24 Thread Fabian Hueske
Hi Anton, If I got your requirements right, you are looking for a solution that continuously produces updated partial aggregates in a streaming fashion. When a special event (no more trades) is received, you would like to store the last update as a final result. Is that correct? You can compute

Re: Watermarks as "process completion" flags

2015-11-24 Thread Anton Polyakov
Hi Max thanks for reply. From what I understand window works in a way that it buffers records while window is open, then apply transformation once window close is triggered and pass transformed result. In my case then window will be open for few hours, then the whole amount of trades will be proce

Re: Watermarks as "process completion" flags

2015-11-24 Thread Maximilian Michels
Hi Anton, You should be able to model your problem using the Flink Streaming API. The actions you want to perform on the streamed records correspond to transformations on Windows. You can indeed use Watermarks to signal the window that a threshold for an action has been reached. Otherwise an evict