Re: Keyed watermarks: A fine-grained watermark generation for Apache Flink

2025-05-15 Thread Мосин Николай
orkarounds, but all my attempts almost failed due to the lack of timers that would be relay on WM and which I do not use now. Кому: Zhanghao Chen (zhanghao.c...@outlook.com);Копия: user@flink.apache.org;Тема: Keyed watermarks: A fine-grained watermark generation for Apache Flink;15.05.2

Re: Keyed watermarks: A fine-grained watermark generation for Apache Flink

2025-05-15 Thread Zhanghao Chen
Thanks for the insightful sharing! Best, Zhanghao Chen From: Lasse Nedergaard Sent: Thursday, May 15, 2025 13:10 To: Zhanghao Chen Cc: mosin...@yandex.ru ; user@flink.apache.org Subject: Re: Keyed watermarks: A fine-grained watermark generation for Apache

Re: Keyed watermarks: A fine-grained watermark generation for Apache Flink

2025-05-14 Thread Lasse Nedergaard
ul in your > pipelines. > > [1] > https://cwiki.apache.org/confluence/display/FLINK/FLIP-467%3A+Introduce+Generalized+Watermarks > > Best, > Zhanghao Chen > From: Мосин Николай > Sent: Thursday, May 15, 2025 3:58 > To: user@flink.apache.org > Subject: Keyed wa

Re: Keyed watermarks: A fine-grained watermark generation for Apache Flink

2025-05-14 Thread Zhanghao Chen
P-467%3A+Introduce+Generalized+Watermarks Best, Zhanghao Chen From: Мосин Николай Sent: Thursday, May 15, 2025 3:58 To: user@flink.apache.org Subject: Keyed watermarks: A fine-grained watermark generation for Apache Flink I found paper https://scholar.google.com/sc

Keyed watermarks: A fine-grained watermark generation for Apache Flink

2025-05-14 Thread Мосин Николай
I found paper https://scholar.google.com/scholar?q=10.1016/j.future.2025.107796 where described Keyed Watermarks that is what I need in my pipelines. Does anyone know is it planned to implement Keyed Watermarks in Flink and when?

Re: Watermarks

2023-10-02 Thread Perez
As per this link, it says that it only supports value_only for now as I am using pyflink. Does it mean I can't extract the timestamp appended by Kafka with pyflink as of now? https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/connectors/datastream/kafka/#deserializer or does it mean s

Re: Watermarks

2023-10-02 Thread Perez
Hi Liu and Jinfeng, I am trying to implement KafkaDeserializationSchema for Pyflink but am unable to get any examples. Can you share some links or references using which I can understand and try to implement myself? Perez sid.

Re: Watermarks

2023-09-13 Thread Perez
Cool thanks for the clarification. Sid. On Mon, Sep 11, 2023 at 9:22 AM liu ron wrote: > Hi, Sid > > For the second question, I think it is not needed. > > Best, > Ron > > Feng Jin 于2023年9月9日周六 21:19写道: > >> hi Sid >> >> >> 1. You can customize KafkaDeserializationSchema[1], in the `deserializ

Re: Watermarks

2023-09-10 Thread liu ron
Hi, Sid For the second question, I think it is not needed. Best, Ron Feng Jin 于2023年9月9日周六 21:19写道: > hi Sid > > > 1. You can customize KafkaDeserializationSchema[1], in the `deserialize` > method, you can obtain the Kafka event time. > > 2. I don't think it's necessary to explicitly mention

Re: Watermarks

2023-09-09 Thread Feng Jin
hi Sid 1. You can customize KafkaDeserializationSchema[1], in the `deserialize` method, you can obtain the Kafka event time. 2. I don't think it's necessary to explicitly mention the watermark strategy. [1]. https://nightlies.apache.org/flink/flink-docs-master/api/java/org/apache/flink/connec

Watermarks

2023-09-09 Thread Sid
Hello experts, My source is Kafka and I am trying to generate records for which I have FlinkKafkaConsumer class. Now my first question is how to consume an event timestamp for the records generated. I know for a fact that for CLI, there is one property called *print.timestamp=true* which gives yo

Re: Watermarks lagging behind events that generate them

2023-03-16 Thread David Anderson
Watermarks are not included in checkpoints or savepoints. See [1] for some head-scratchingly-complicated info about restarts, watermarks, and unaligned checkpoints. [1] https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/ops/state/checkpointing_under_backpressure/#interplay-with

Re: Watermarks lagging behind events that generate them

2023-03-15 Thread Shammon FY
Hi Alexis Currently I think checkpoint and savepoint will not save watermarks. I think how to deal with watermarks at checkpoint/savepoint is a good question, we can discuss this in dev mail list Best, Shammon FY On Wed, Mar 15, 2023 at 4:22 PM Alexis Sarda-Espinosa < sarda.espin...@gmail.

Re: Watermarks lagging behind events that generate them

2023-03-15 Thread Alexis Sarda-Espinosa
e: > >> Hi David, thanks for the answer. One follow-up question: will the >> watermark be reset to Long.MIN_VALUE every time I restart a job with >> savepoint? >> >> Am Di., 14. März 2023 um 05:08 Uhr schrieb David Anderson < >> dander...@apache.org>: &

Re: Watermarks lagging behind events that generate them

2023-03-14 Thread Shammon FY
pin...@gmail.com> wrote: > Hi David, thanks for the answer. One follow-up question: will the > watermark be reset to Long.MIN_VALUE every time I restart a job with > savepoint? > > Am Di., 14. März 2023 um 05:08 Uhr schrieb David Anderson < > dander...@apache.org>: >

Re: Watermarks lagging behind events that generate them

2023-03-14 Thread Alexis Sarda-Espinosa
Hi David, thanks for the answer. One follow-up question: will the watermark be reset to Long.MIN_VALUE every time I restart a job with savepoint? Am Di., 14. März 2023 um 05:08 Uhr schrieb David Anderson < dander...@apache.org>: > Watermarks always follow the corresponding event(s). I&#

Re: Watermarks lagging behind events that generate them

2023-03-13 Thread David Anderson
Watermarks always follow the corresponding event(s). I'm not sure why they were designed that way, but that is how they are implemented. Windows maintain this contract by emitting all of their results before forwarding the watermark that triggered the results. David On Mon, Mar 13, 2023

Re: Watermarks lagging behind events that generate them

2023-03-13 Thread Shammon FY
Hi Alexis Do you use both event-time watermark generator and TimerService for processing time in your job? Maybe you can try using event-time watermark first. Best, Shammon.FY On Sat, Mar 11, 2023 at 7:47 AM Alexis Sarda-Espinosa < sarda.espin...@gmail.com> wrote: > Hello, > > I recently ran in

Watermarks lagging behind events that generate them

2023-03-10 Thread Alexis Sarda-Espinosa
Hello, I recently ran into a weird issue with a streaming job in Flink 1.16.1. One of my functions (KeyedProcessFunction) has been using processing time timers. I now want to execute the same job based on a historical data dump, so I had to adjust the logic to use event time timers in that case (a

Re: Non-temporal watermarks

2023-02-03 Thread David Anderson
DataStream time windows and Flink SQL make assumptions about the timestamps and watermarks being milliseconds since the epoch. But the underlying machinery does not. So if you limit yourself to process functions (for example), then nothing will assign any semantics to the time values. David On

Re: Non-temporal watermarks

2023-02-02 Thread James Sandys-Lumsdaine
when the watermark is reached and that required all sources to have reached at least that point in the recovery). Once we have reached the startup datetime watermark the system seamlessly flips into live processing mode. The watermarks still trigger my timers but now we are processing the last

Re: Non-temporal watermarks

2023-02-02 Thread Gen Luo
ically, the > concept of watermarks is more abstract, so I'll leave implementation > details aside. > > Speaking generally, yes, there is a set of requirements that must be met > in order to be able to generate a system that uses watermarks. > > The primary question is what

Re: Non-temporal watermarks

2023-02-02 Thread Jan Lukavský
Hi, I will not speak about details related to Flink specifically, the concept of watermarks is more abstract, so I'll leave implementation details aside. Speaking generally, yes, there is a set of requirements that must be met in order to be able to generate a system that uses water

Non-temporal watermarks

2023-02-01 Thread Yaroslav Tkachenko
Hey everyone, I'm wondering if anyone has done any experiments trying to use non-temporal watermarks? For example, a dataset may contain some kind of virtual timestamp / version field that behaves just like a regular timestamp (monotonically increasing, etc.), but has a different scale /

RE: Processing watermarks in a broadcast connected stream

2023-01-31 Thread Schwalbe Matthias
) number of DataStream keyed/broadcast/plain and also to tap into the meta-stream of watermark events. Each Input is set up separately and can implement separate handlers for the events/watermarks/etc. However, it is an operator implementation, you e.g. need to manually set up timer manager and a

Processing watermarks in a broadcast connected stream

2023-01-30 Thread Sajjad Rizvi
Hi, I am trying to process watermarks in a BroadcastConnectedStream. However, I am not able to find any direct way to handle watermark events, similar to what we have in processWatermark1 in a KeyedCoProcessOperator. Following are further details. In the context of the example given in “A

Re: Can't use nested attributes as watermarks in Table

2022-12-17 Thread Theodor Wübker
be interested (since obviously it would help my thesis). Maybe you can tell how much effort it would be? I imagine it would need support in the place where the watermarks are registered (the one I sent) and in the place they are actually used (which I have not checked yet at all). -Theo > On

Re: Can't use nested attributes as watermarks in Table

2022-12-16 Thread Martijn Visser
Hi Theo, The most logical reason is that nested attributes were added later than watermarks were :) I agree that it's something that would be worthwhile to improve. If you can and want to make a contribution on this, that would be great. Best regards, Martijn On Wed, Dec 14, 2022 at 9:

Re: Can't use nested attributes as watermarks in Table

2022-12-14 Thread Theodor Wübker
Actually, this behaviour is documented <https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/create/#create-table> (See the Watermarks section, where it is stated that the column must be a “top-level” column). So I suppose, there is a reason. Nevertheless it is qu

Can't use nested attributes as watermarks in Table

2022-12-13 Thread Theodor Wübker
Hey everyone, I have encountered a problem with my Table API Program. I am trying to use a nested attribute as a watermark. The structure of my schema is a row, which itself has 3 rows as attributes and they again have some attributes, especially the Timestamp that I want to use as a watermark.

Re: Overwriting watermarks in DataStream

2022-08-22 Thread Peter Schrott
Hi David, Thanks a lot for clarification. Best, Peter > On 21. Aug 2022, at 18:36, David Anderson wrote: > > If you have two watermark strategies in your job, the downstream > TimestampsAndWatermarksOperator will absorb incoming watermarks and not > forward them downstre

Re: Overwriting watermarks in DataStream

2022-08-21 Thread David Anderson
If you have two watermark strategies in your job, the downstream TimestampsAndWatermarksOperator will absorb incoming watermarks and not forward them downstream, but it will have no effect upstream. The only exception to this is that watermarks equal to Long.MAX_VALUE are forwarded downstream

Overwriting watermarks in DataStream

2022-08-18 Thread Peter Schrott
Hi there, While still struggling with events and watermarks out of order after sorting with a buffer process function (compare [1]) I tired to solve the issue by assigning a new watermark after the mentioned sorting function. The Flink docs [2] are not very certain about the impact of

Eventtimes and watermarks not in sync after sorting stream by eventide

2022-08-17 Thread Peter Schrott
the subsequent event times, the event time of the current event is compared with the current watermark (context.timerService().currentWatermark()). Result: Watermarks and event timestamps are NOT ascending. On some events the event timestamp is lower than the current watermark. I somehow suspect

Re: [DataStream API] Watermarks not closing TumblingEventTimeWindow with Reduce function

2022-04-29 Thread r pp
>>> .withTimestampAssigner(new >>>> SerializableTimestampAssigner[StarscreamEventCounter_V1] { >>>> override def extractTimestamp(element: StarscreamEventCounter_V1, >>>> recordTimestamp: Long): Long = >>>> element.envelopeTimestamp >&

Re: [DataStream API] Watermarks not closing TumblingEventTimeWindow with Reduce function

2022-04-04 Thread Ryan van Huuksloot
uration.of(2, ChronoUnit.HOURS)) >>> .withTimestampAssigner(new >>> SerializableTimestampAssigner[StarscreamEventCounter_V1] { >>> override def extractTimestamp(element: StarscreamEventCounter_V1, >>> recordTimestamp: Long): Long = >>> el

Re: [DataStream API] Watermarks not closing TumblingEventTimeWindow with Reduce function

2022-04-04 Thread Arvid Heise
.withTimestampAssigner(new >> SerializableTimestampAssigner[StarscreamEventCounter_V1] { >> override def extractTimestamp(element: StarscreamEventCounter_V1, >> recordTimestamp: Long): Long = >> element.envelopeTimestamp >> }) >> >> The Watermarks are correct

Re: [DataStream API] Watermarks not closing TumblingEventTimeWindow with Reduce function

2022-04-01 Thread r pp
f(2, ChronoUnit.HOURS)) > .withTimestampAssigner(new > SerializableTimestampAssigner[StarscreamEventCounter_V1] { > override def extractTimestamp(element: StarscreamEventCounter_V1, > recordTimestamp: Long): Long = > element.envelopeTimestamp > }) > > The Watermarks are

[DataStream API] Watermarks not closing TumblingEventTimeWindow with Reduce function

2022-03-31 Thread Ryan van Huuksloot
extractTimestamp(element: StarscreamEventCounter_V1, recordTimestamp: Long): Long = element.envelopeTimestamp }) The Watermarks are correctly getting assigned. However, when a reduce function is used the window never terminates because the `ctx.getCurrentWatermark()` returns the default

Re: Watermarks event time vs processing time

2022-03-29 Thread HG
toString()); out.collect(((ObjectNode) originalEvent).toString()); } } } Op di 29 mrt. 2022 om 15:23 schreef Schwalbe Matthias < matthias.schwa...@viseca.ch>: > Hello Hans-Peter, > > > > I’m a little confused which version of your code you

RE: Watermarks event time vs processing time

2022-03-29 Thread Schwalbe Matthias
Hello Hans-Peter, I’m a little confused which version of your code you are testing against: * ProcessingTimeSessionWindows or EventTimeSessionWindows? * did you keep the withIdleness() ?? As said before: * for ProcessingTimeSessionWindows, watermarks play no role * if you keep

Re: Watermarks event time vs processing time

2022-03-29 Thread HG
r on >2. So far you used a session window to determine the point in time >when to emit the stored/enriched/sorted events >3. Watermarks are generated with bounded out of orderness >4. You use session windows with a specific gap >5. In your experiment you ever onl

RE: Interval join operator is not forwarding watermarks correctly

2022-03-17 Thread Alexis Sarda-Espinosa
Hi Dawid, Thanks for the update, I also managed to work around it by adding another watermark assignment operator between the join and the window. I’ll have to see if it’s possible to assign watermarks at the source, but even if it is, I worry that the different partitions created by all my

Re: Interval join operator is not forwarding watermarks correctly

2022-03-17 Thread Dawid Wysakowicz
Hi Alexis, I tried looking into your example. First of all, so far, I've spent only a limited time looking at the WatermarkGenerator, and I have not thoroughly understood how it works. I'd discourage assigning watermarks anywhere in the middle of your pipeline. This is considere

RE: Watermarks event time vs processing time

2022-03-16 Thread Schwalbe Matthias
ing of the watermarks on single operators / per subtask useful: Look for subtasks that don’t have watermarks, or too low watermarks for a specific session window to trigger. Thias From: HG Sent: Mittwoch, 16. März 2022 16:41 To: Schwalbe Matthias Cc: user Subject: Re: Watermarks event time

Re: Watermarks event time vs processing time

2022-03-16 Thread HG
e not > specific enough (please confirm or correct in your answer): > >1. You store incoming events in state per transaction_id to be >sorted/aggregated(min/max time) by event time later on >2. So far you used a session window to determine the point in time >when to emi

RE: Watermarks event time vs processing time

2022-03-16 Thread Schwalbe Matthias
/aggregated(min/max time) by event time later on 2. So far you used a session window to determine the point in time when to emit the stored/enriched/sorted events 3. Watermarks are generated with bounded out of orderness 4. You use session windows with a specific gap 5. In your experiment you

Watermarks event time vs processing time

2022-03-16 Thread HG
somehow the processing did not run as expected. When I pushed 1000 events eventually 800 or so would appear at the output. This was resolved by switching to ProcessingTimeSessionWindows . My thought was then that I could remove the watermarkstrategies with watermarks with timestamp assigners (handling

Re: Interval join operator is not forwarding watermarks correctly

2022-03-15 Thread Alexis Sarda-Espinosa
For completeness, this still happens with Flink 1.14.4 Regards, Alexis. From: Alexis Sarda-Espinosa Sent: Friday, March 11, 2022 12:21 AM To: user@flink.apache.org Cc: pnowoj...@apache.org Subject: Re: Interval join operator is not forwarding watermarks

Re: Interval join operator is not forwarding watermarks correctly

2022-03-10 Thread Alexis Sarda-Espinosa
. [1] https://github.com/asardaes/flink-interval-join-test Regards, Alexis. From: Alexis Sarda-Espinosa Sent: Thursday, March 10, 2022 7:47 PM To: user@flink.apache.org Cc: pnowoj...@apache.org Subject: RE: Interval join operator is not forwarding water

RE: Interval join operator is not forwarding watermarks correctly

2022-03-10 Thread Alexis Sarda-Espinosa
: Interval join operator is not forwarding watermarks correctly Hello, I'm in the process of updating from Flink 1.11.3 to 1.14.3, and it seems the interval join in my pipeline is no longer working. More specifically, I have a sliding window after the interval join, and the window isn't fir

Interval join operator is not forwarding watermarks correctly

2022-03-10 Thread Alexis Sarda-Espinosa
(...); stream1.connect(stream2) .keyBy(selector1, selector2) .transform("Interval Join", TypeInformation.of(Pojo.class), joinOperator); --- Some more information in case it's relevant: - stream2 is obtained from a side output. - both stream1 and stream2 have watermarks as

Re: table api watermarks, timestamps, outoforderness and head aches

2022-02-14 Thread Francesco Guardiani
en you should set >> an idleness which after that, a watermark is produced. >> >> Idleness is >> >> On Fri, Feb 11, 2022 at 2:53 PM HG wrote: >> >>> Hi, >>> >>> I am getting a headache when thinking about watermarks and timestamps. &g

Re: table api watermarks, timestamps, outoforderness and head aches

2022-02-14 Thread HG
le of days nothing is produced), then you should set an > idleness which after that, a watermark is produced. > > Idleness is > > On Fri, Feb 11, 2022 at 2:53 PM HG wrote: > >> Hi, >> >> I am getting a headache when thinking about watermarks and timestam

Re: table api watermarks, timestamps, outoforderness and head aches

2022-02-14 Thread Francesco Guardiani
e of days nothing is produced), then you should set an idleness which after that, a watermark is produced. Idleness is On Fri, Feb 11, 2022 at 2:53 PM HG wrote: > Hi, > > I am getting a headache when thinking about watermarks and timestamps. > My application reads events from Kafka

table api watermarks, timestamps, outoforderness and head aches

2022-02-11 Thread HG
Hi, I am getting a headache when thinking about watermarks and timestamps. My application reads events from Kafka (they are in json format) as a Datastream Events can be keyed by a transactionId and have a event timestamp (handlingTime) All events belonging to a single transactionId will arrive

Re: Flink 1.11 loses track of event timestamps and watermarks after process function

2021-10-18 Thread Arvid Heise
processFunction will just emit watermarks from upstream as they come. No function/operator in Flink is a black hole w.r.t. watermarks. It's just important to remember that watermark after a network shuffle is always the min of all inputs (ignoring idle inputs). So if any connected upstream pa

Re: Flink 1.11 loses track of event timestamps and watermarks after process function

2021-10-15 Thread Ahmad Alkilani
Thanks again Arvid, I am getting closer to the culprit as I've found some interesting scenarios. Still no exact answer yet. We are indeed also using .withIdleness to mitigate slow/issues with partitions. I did have a few clarifying questions though w.r.t watermarks if you don't mind.

Re: Flink 1.11 loses track of event timestamps and watermarks after process function

2021-10-14 Thread Arvid Heise
rmark upstream. A usual suspect when not seeing good watermarks is that the custom watermark assigner is not working as expected. But you mentioned that with a no-op, the process function is actually showing the watermark and that leaves me completely puzzled. I would dump down your example even mo

Re: Flink 1.11 loses track of event timestamps and watermarks after process function

2021-10-12 Thread Ahmad Alkilani
st to show progress. So now: Kafka -> Flink Kafka source -> flatMap (map & filter) -> assignTimestampsAndWaterMarks -> map Function -> *process function (print watermarks) *-> Key By -> Keyed Process Function -> *process function (print watermarks)* -> Simple Sink I am seei

Re: Flink 1.11 loses track of event timestamps and watermarks after process function

2021-10-11 Thread Arvid Heise
that, can you please simply remove AsyncIO+Sink from your job and check for print statements? On Tue, Oct 12, 2021 at 3:23 AM Ahmad Alkilani wrote: > Flink 1.11 > I have a simple Flink application that reads from Kafka, uses event > timestamps, assigns timestamps and watermarks and the

Flink 1.11 loses track of event timestamps and watermarks after process function

2021-10-11 Thread Ahmad Alkilani
Flink 1.11 I have a simple Flink application that reads from Kafka, uses event timestamps, assigns timestamps and watermarks and then key's by a field and uses a KeyedProcessFunciton. The keyed process function outputs events from with the `processElement` method using `out.collect`. No t

Re: Empty Kafka topics and watermarks

2021-10-11 Thread Piotr Nowojski
r Nowojski > *Sent:* 08 October 2021 13:17 > *To:* James Sandys-Lumsdaine > *Cc:* user@flink.apache.org > *Subject:* Re: Empty Kafka topics and watermarks > > Hi James, > > I believe you have encountered a bug that we have already fixed [1]. The > small problem is t

Re: Empty Kafka topics and watermarks

2021-10-11 Thread James Sandys-Lumsdaine
Kafka topics and watermarks Hi James, I believe you have encountered a bug that we have already fixed [1]. The small problem is that in order to fix this bug, we had to change some `@PublicEvolving` interfaces and thus we were not able to backport this fix to earlier minor releases. As such, this

Re: Empty Kafka topics and watermarks

2021-10-08 Thread Piotr Nowojski
Hi James, I believe you have encountered a bug that we have already fixed [1]. The small problem is that in order to fix this bug, we had to change some `@PublicEvolving` interfaces and thus we were not able to backport this fix to earlier minor releases. As such, this is only fixed starting from

Empty Kafka topics and watermarks

2021-10-08 Thread James Sandys-Lumsdaine
Hi everyone, I'm putting together a Flink workflow that needs to merge historic data from a custom JDBC source with a Kafka flow (for the realtime data). I have successfully written the custom JDBC source that emits a watermark for the last event time after all the DB data has been emitted but

Re: stream processing savepoints and watermarks question

2021-09-24 Thread Marco Villalobos
ay. >> When we tried to shutdown a job with a savepoint, the watermarks became >> equal to 2^63 - 1. >> >> This caused timers to fire indefinitely and crash downstream systems with >> overloaded untrue data. >> >> We are using event time processing with Kafka

RE: stream processing savepoints and watermarks question

2021-09-24 Thread Schwalbe Matthias
the infinite iteration over timers … I believe the behavior exhibited by flink is intentional and no defect! What do you think? Thias From: JING ZHANG Sent: Freitag, 24. September 2021 12:25 To: Guowei Ma Cc: Marco Villalobos ; user Subject: Re: stream processing savepoints and watermarks quest

Re: stream processing savepoints and watermarks question

2021-09-24 Thread JING ZHANG
r > could only be triggered when there is a watermark (except the "quiesce > phase"). > I think it could not advance any watermarks after MAX_WATERMARK is > received. > > Best, > Guowei > > > On Fri, Sep 24, 2021 at 4:31 PM JING ZHANG wrote: > >> Hi Guow

Re: stream processing savepoints and watermarks question

2021-09-24 Thread Guowei Ma
Hi, JING Thanks for the case. But I am not sure this would happen. As far as I know the event timer could only be triggered when there is a watermark (except the "quiesce phase"). I think it could not advance any watermarks after MAX_WATERMARK is received. Best, Guowei On Fri, Sep 2

Re: stream processing savepoints and watermarks question

2021-09-24 Thread JING ZHANG
; https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/cli/#stopping-a-job-gracefully-creating-a-final-savepoint >> >> Best, >> JING ZHANG >> >> Marco Villalobos 于2021年9月24日周五 下午12:54写道: >> >>> Something strange happened today. >>&

Re: stream processing savepoints and watermarks question

2021-09-24 Thread Guowei Ma
ay. >> When we tried to shutdown a job with a savepoint, the watermarks became >> equal to 2^63 - 1. >> >> This caused timers to fire indefinitely and crash downstream systems with >> overloaded untrue data. >> >> We are using event time processing with Kafka as o

Re: stream processing savepoints and watermarks question

2021-09-23 Thread JING ZHANG
-final-savepoint Best, JING ZHANG Marco Villalobos 于2021年9月24日周五 下午12:54写道: > Something strange happened today. > When we tried to shutdown a job with a savepoint, the watermarks became > equal to 2^63 - 1. > > This caused timers to fire indefinitely and crash downstream systems wi

stream processing savepoints and watermarks question

2021-09-23 Thread Marco Villalobos
Something strange happened today. When we tried to shutdown a job with a savepoint, the watermarks became equal to 2^63 - 1. This caused timers to fire indefinitely and crash downstream systems with overloaded untrue data. We are using event time processing with Kafka as our source. It seems

Re: Theory question on process_continously processing mode and watermarks

2021-08-19 Thread Arvid Heise
I think what you are seeing is that the files have records with similar timestamps. That means after reading file1 your watermarks are already progressed to the end of your time range. When Flink picks up file2, all records are considered late records and no windows fire anymore. See [1] for a

Re: Theory question on process_continously processing mode and watermarks

2021-08-19 Thread Caizhi Weng
. > > Using FileProcessingMode.*PROCESS_CONTINUOUSLY* > > Into a streaming job that uses tumbling Windows and watermarks causes my > streaming process to stop ad the reading files phase. > > Meanwhile if i delete my declarations of Windows and watermark the program > works as expected. &g

Theory question on process_continously processing mode and watermarks

2021-08-19 Thread Fra
Hello, during my personal development of a Flink streaming Platform i found something that perplexes me.Using FileProcessingMode.PROCESS_CONTINUOUSLYInto a streaming job that uses tumbling Windows and watermarks causes my streaming process to stop ad the reading files phase.Meanwhile if i delete

Re: Watermarks in Event Time Temporal Join

2021-04-28 Thread Leonard Xu
Thanks for your example, Maciej I can explain more about the design. > Let's have events. > S1, id1, v1, 1 > S2, id1, v2, 1 > > Nothing is happening as none of the streams have reached the watermark. > Now let's add > S2, id2, v2, 101 > This should trigger join for id1 because we have all the kno

Re: Watermarks in Event Time Temporal Join

2021-04-28 Thread Maciej Bryński
Hi Leonard, Let's assume we have two streams. S1 - id, value1, ts1 with watermark = ts1 - 1 S2 - id, value2, ts2 with watermark = ts2 - 1 Then we have following interval join SELECT id, value1, value2, ts1 FROM S1 JOIN S2 ON S1.id = S2.id and ts1 between ts2 - 1 and ts2 Let's have events. stream,

Re: Watermarks in Event Time Temporal Join

2021-04-26 Thread Leonard Xu
Hello, Maciej > I agree the watermark should pass on versioned table side, because > this is the only way to know which version of record should be used. > But if we mimics behaviour of interval join then main stream watermark > could be skipped. IIRC, rowtime interval join requires the watermark

Re: Watermarks in Event Time Temporal Join

2021-04-25 Thread Maciej Bryński
essage is late or early. If we only > use the watermark on versioned table side, we have no means to determine > whether the event in the main stream is ready to emit. > > Best, > Shengkai > > maverick 于2021年4月26日周一 上午2:31写道: >> >> Hi, >> I'm curious why

Re: Watermarks in Event Time Temporal Join

2021-04-25 Thread Shengkai Fang
curious why Event Time Temporal Join needs watermarks from both sides > to > perform join. > > Shouldn't watermark on versioned table side be enough to perform join ? > > > > > > -- > Sent from: > http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ >

Watermarks in Event Time Temporal Join

2021-04-25 Thread maverick
Hi, I'm curious why Event Time Temporal Join needs watermarks from both sides to perform join. Shouldn't watermark on versioned table side be enough to perform join ? -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Flink 1.11 FlinkKafkaConsumer not propagating watermarks

2021-04-21 Thread Arvid Heise
processors using Flink 1.12, and tried to get them working on Amazon > EMR. However Amazon EMR only supports Flink 1.11.2 at the moment. When I > went to downgrade, I found, inexplicably, that watermarks were no longer > propagating. > > There is only one partition on the topic,

Re: proper way to manage watermarks with messages combining multiple timestamps

2021-04-19 Thread Arvid Heise
Hi Mathieu, The easiest way is to already emit several inputs on the source level. If you use DeserializationSchema, try to use the method with the collector. The watermarks should then be generated as if you would only receive one element at a time. On Sun, Apr 18, 2021 at 11:08 AM Mathieu D

Re: proper way to manage watermarks with messages combining multiple timestamps

2021-04-18 Thread Mathieu D
-9:00 aggregation m2 in the 9:00-10:00 aggregation What's the proper way to set the watermarks in such a case ? Thanks for your insights ! Mathieu Le sam. 17 avr. 2021 à 07:05, Lasse Nedergaard < lassenedergaardfl...@gmail.com> a écrit : > Hi > > One thing to remember is that F

proper way to manage watermarks with messages combining multiple timestamps

2021-04-16 Thread Mathieu D
Hello, I'm totally new to Flink, and I'd like to make sure I understand things properly around watermarks. We're processing messages from iot devices. Those messages have a timestamp, and we have a first phase of processing based on this timestamp. So far so good. These messages

Flink 1.11 FlinkKafkaConsumer not propagating watermarks

2021-04-14 Thread Edward Bingham
Hi everyone, I'm seeing some strange behavior from FlinkKafkaConsumer. I wrote up some Flink processors using Flink 1.12, and tried to get them working on Amazon EMR. However Amazon EMR only supports Flink 1.11.2 at the moment. When I went to downgrade, I found, inexplicably, that watermarks

Re: how to propagate watermarks across multiple jobs

2021-03-04 Thread yidan zhao
Thank you. Yuan Mei 于2021年3月4日周四 下午11:10写道: > Hey Yidan, > > KafkaShuffle is initially motivated to support shuffle data > materialization on Kafka, and started with a limited version supporting > hash-partition only. Watermark is maintained and forwarded as part of > shuffle data. So you are ri

Re: how to propagate watermarks across multiple jobs

2021-03-04 Thread Yuan Mei
Hey Yidan, KafkaShuffle is initially motivated to support shuffle data materialization on Kafka, and started with a limited version supporting hash-partition only. Watermark is maintained and forwarded as part of shuffle data. So you are right, watermark storing/forwarding logic has nothing to do

Re: how to propagate watermarks across multiple jobs

2021-03-04 Thread yidan zhao
And do you know when kafka consumer/producer will be re implemented according to the new source/sink api? I am thinking whether I should adjust the code for now, since I need to re adjust the code when it is reconstructed to the new source/sink api. yidan zhao 于2021年3月4日周四 下午4:44写道: > I uploaded

Re: how to propagate watermarks across multiple jobs

2021-03-04 Thread yidan zhao
I uploaded a picture to describe that. https://ftp.bmp.ovh/imgs/2021/03/2068f2e22045e696.png >

Re: how to propagate watermarks across multiple jobs

2021-03-04 Thread yidan zhao
One more question, If I only need watermark's logic, not keyedStream, why not provide methods such as writeDataStream and readDataStream. It uses the similar methods for kafka producer sink records and broadcast watermark to partitions and then kafka consumers read it and regenerate the watermark.

Re: how to propagate watermarks across multiple jobs

2021-03-03 Thread Piotr Nowojski
Great :) Just one more note. Currently FlinkKafkaShuffle has a critical bug [1] that probably will prevent you from using it directly. I hope it will be fixed in some next release. In the meantime you can just inspire your solution with the source code. Best, Piotrek [1] https://issues.apache.o

Re: how to propagate watermarks across multiple jobs

2021-03-03 Thread yidan zhao
Yes, you are right and thank you. I take a brief look at what FlinkKafkaShuffle is doing, it seems what I need and I will have a try. >

Re: how to propagate watermarks across multiple jobs

2021-03-03 Thread Piotr Nowojski
Hi, Can not you write the watermark as a special event to the "mid-topic"? In the "new job2" you would parse this event and use it to assign watermark before `xxxWindow2`? I believe this is what FlinkKafkaShuffle is doing [1], you could look at its code for inspiration. Piotrek [1] https://ci.ap

how to propagate watermarks across multiple jobs

2021-03-01 Thread yidan zhao
I have a job which includes about 50+ tasks. I want to split it to multiple jobs, and the data is transferred through Kafka, but how about watermark? Is anyone have do something similar and solved this problem? Here I give an example: The original job: kafkaStream1(src-topic) => xxxProcess => xxx

Re: Watermarks on map operator

2021-02-05 Thread David Anderson
Basically the only thing that Watermarks do is to trigger event time timers. Event time timers are used explicitly in KeyedProcessFunctions, but are also used internally by time windows, CEP (to sort the event stream), in various time-based join operations, and within the Table/SQL API. If you

Re: Watermarks on map operator

2021-02-04 Thread Kezhu Wang
> it is not clear to me if watermarks are also used by map/flatmpat operators or just by window operators. Watermarks are most liked only used by timing segmented aggregation operator to trigger result materialization. In streaming, this “timing segmentation” is usually called “windowing”, so

Watermarks on map operator

2021-02-04 Thread Antonis Papaioannou
Hi, reading through the documentation regarding waterrmarks, it is not clear to me if watermarks are also used by map/flatmpat operators or just by window operators. My application reads from a kafka topic (with multiple partitions) and extracts assigns timestamp on each tuple based on some

  1   2   3   4   >