Session Window Join in pyflink

2025-03-26 Thread Hugo POLSINELLI
Hello everyone, I've been struggling for a while to perform a session-window join using pyflink / SQL API. From my understanding it seems that the table API requires an equality on the window_start and window_end parameters. Unfortunately, in the case of session join (with static gap)

Re:Event-Driven Window Computation

2024-07-17 Thread Xuyang
-- Best! Xuyang At 2024-07-17 17:22:01, "liu ze" wrote: Hi, Currently, Flink's windows are based on time (or a fixed number of elements). I want to trigger window computation based on specific events (marked within the data). In the DataStream API, this can be

Event-Driven Window Computation

2024-07-17 Thread liu ze
Hi, Currently, Flink's windows are based on time (or a fixed number of elements). I want to trigger window computation based on specific events (marked within the data). In the DataStream API, this can be achieved using GlobalWindow and custom triggers, but how can it be done in Flin

Re: What is the best way to aggregate data over a long window

2024-05-20 Thread gongzhongqiang
Hi Sachin, `performing incremental aggregation using stateful processing` is same as `windows with agg`, but former is more flexible.If flink window can not satisfy your performance needs ,and your business logic has some features that can be customized for optimization. You can choose the

Re: What is the best way to aggregate data over a long window

2024-05-17 Thread Sachin Mittal
= reducedPlayerStatsData .keyBy(new KeySelector()) .window( TumblingEventTimeWindows.of(Time.seconds(secs))) .aggregate(new DataAggregator()) .name("aggregate"); In this case data which is aggregated is of a different type than the input so I had to use

Re: What is the best way to aggregate data over a long window

2024-05-17 Thread gongzhongqiang
- use multi-level time window granularity for pre-aggregation can significantly improve performance and reduce computation latency Best, Zhongqiang Gong Sachin Mittal 于2024年5月17日周五 03:48写道: > Hi, > My pipeline step is something like this: > > SingleOutputStreamOperator reducedData

What is the best way to aggregate data over a long window

2024-05-16 Thread Sachin Mittal
Hi, My pipeline step is something like this: SingleOutputStreamOperator reducedData = data .keyBy(new KeySelector()) .window( TumblingEventTimeWindows.of(Time.seconds(secs))) .reduce(new DataReducer()) .name("reduce"); This works fin

Re: How to debug window step in flink

2024-04-08 Thread Sachin Mittal
, Apr 8, 2024 at 11:55 AM wrote: > Hi Sachin > > > > What exactly does the MyReducer do? Can you provide us with some code? > > > > Just a wild guess from my side, did you check the watermarking? If the > Watermarks aren't progressing there's no way for F

Re: How to debug window step in flink

2024-04-07 Thread Dominik.Buenzli
Hi Sachin What exactly does the MyReducer do? Can you provide us with some code? Just a wild guess from my side, did you check the watermarking? If the Watermarks aren't progressing there's no way for Flink to know when to emit a window and therefore you won't see any outgoin

How to debug window step in flink

2024-04-07 Thread Sachin Mittal
Hi, I have a following windowing step in my pipeline: inputData .keyBy(new MyKeySelector()) .window( TumblingEventTimeWindows.of(Time.seconds(60))) .reduce(new MyReducer()) .name("MyReducer"); Same step when I see in Flink UI shows as: Window(TumblingEventT

Concerns and Anomalies in Flink Window Functions with TumblingProcessingTimeWindows

2023-12-05 Thread arjun s
Hi team, I'm a newcomer to Flink's window functions, specifically utilizing TumblingProcessingTimeWindows with a configured window duration of 20 minutes. However, I've noticed an anomaly where the window output occurs within 16 to 18 minutes. This has left me uncertain about wheth

Re: Inquiry Regarding Flink Tumbling Window Persistence and Restart Handling for File Source

2023-12-04 Thread Jeyhun Karimov
> > On Mon, Dec 4, 2023 at 8:03 PM arjun s wrote: > >> Hello team, >> I'm relatively new to Flink's window functions, and I've configured a >> tumbling window with a 10-minute duration. I'm wondering about the scenario >> where the Flink job is rest

Inquiry Regarding Flink Tumbling Window Persistence and Restart Handling for File Source

2023-12-04 Thread arjun s
Hello team, I'm relatively new to Flink's window functions, and I've configured a tumbling window with a 10-minute duration. I'm wondering about the scenario where the Flink job is restarted or the Flink application goes down. Is there a mechanism to persist the aggregated

Best practice way to conditionally discard a window and not serialize the results

2023-10-30 Thread Mark Petronic
I am reading stats from Kinesis, deserializing them into a stat POJO and then doing something like this using an aggregated window with no defined processWindow function: timestampedStats .keyBy(v -> v.groupKey()) .window(TumblingEventTimeWindows.of(Time.seco

Re:Delayed Window Trigger

2023-10-29 Thread Xuyang
Are you using Flink SQL? If using Flink SQL, the window is triggered when and only when the special data (with the expected timestamp after watermark) enters. It is not possible to trigger the window without changing the window-start and window-end column. -- Best! Xuyang At

Delayed Window Trigger

2023-10-27 Thread Kenan Kılıçtepe
Is it possible to trigger a window without changing window-start and window-end dates? I have a lot of jobs run in window tumble (3H) and when they are all triggered at the same time, it causes performance problems. If somehow I can delay some of them 10-15 minutes , without changing the

Delayed Window

2023-10-06 Thread Kenan Kılıçtepe
Hi, Is it possible to delay a window trigger without changing window-end and window-start times? Thanks

Re: Window aggregation on two joined table

2023-09-21 Thread Eugenio Marotti
t time passes > by the end of the time window > The event time/watermark time of you join operator is the minimum watermark > time of all inputs > Because your second table does not emit watermark, it’s watermark time > remains at Long.MinValue, hence also the operator time stay

RE: Window aggregation on two joined table

2023-09-21 Thread Schwalbe Matthias
semantics on the second table as well * The logic is this: * Your join operator only generates output windows once the event time passes by the end of the time window * The event time/watermark time of you join operator is the minimum watermark time of all inputs * Because your

Re: Window aggregation on two joined table

2023-09-21 Thread Eugenio Marotti
Hi Matthias, No the second table doesn’t have an event time and a watermark specified. In order for the window to work do I need a watermark also on the second table? Thanks Eugenio > Il giorno 21 set 2023, alle ore 13:45, Schwalbe Matthias > ha scritto: > > Ciao Eugenio, >

RE: Window aggregation on two joined table

2023-09-21 Thread Schwalbe Matthias
you think? Cari saluti Thias From: Eugenio Marotti Sent: Thursday, September 21, 2023 8:56 AM To: user@flink.apache.org Subject: Window aggregation on two joined table Hi, I’m trying to execute a window aggregation on two joined table from two Kafka topics (upsert fashion), but I get no output

Window aggregation on two joined table

2023-09-20 Thread Eugenio Marotti
Hi, I’m trying to execute a window aggregation on two joined table from two Kafka topics (upsert fashion), but I get no output. Here’s the code I’m using: This is the first table from Kafka with an event time watermark on ‘data_fine’ attribute: final TableDescriptor

Re: Flink SQL query with window-TVF fails

2023-08-14 Thread liu ron
ing to run a window aggregation SQL query (on Flink 1.16) with > Windowing TVF for a TUMBLE window with a size of 5 Milliseconds it seems > Flink does not let a window size use a time unit smaller than seconds. Is > that correct? > (The documentation > <https://nightlies.apache

Flink SQL query with window-TVF fails

2023-08-14 Thread Pouria Pirzadeh
I am trying to run a window aggregation SQL query (on Flink 1.16) with Windowing TVF for a TUMBLE window with a size of 5 Milliseconds it seems Flink does not let a window size use a time unit smaller than seconds. Is that correct? (The documentation <https://nightlies.apache.org/flink/flink-d

Re: Average on sliding window

2023-07-04 Thread Alexey Novakov via user
Hi Eugenio, I think it is due to window completion which will be complete once your watermarked field on the event advances 15 days interval since the first received event time. Please also check this default trigger behavior here: https://nightlies.apache.org/flink/flink-docs-release-1.16/docs

Average on sliding window

2023-07-01 Thread Eugenio Marotti
Hi everyone, I’m trying to calculate an average with a sliding window. Here’s the code I’m using. First of all I receive a series of events from a Kafka topic. I declared a watermark on the ‘data_fine’ attribute. final TableDescriptor filteredPhasesDurationsTableDescriptor

Re: Issue with Incremental window aggregation using Aggregate function.

2023-05-18 Thread Sumanta Majumdar
y using TumblingWindow windows > assigner implementation is that the state sizes are growing unconditionally > even when we have tuned rocksdb options and provided a good chunk of > managed memory. > > We are able to read more than 15 records within a period of 4 mins > which

Re: how to configure window of join operator in batch mode

2023-04-26 Thread Shammon FY
Hi Jiadong, >From the context you described, I think ProcessingTimeWindow may not be a good solution. If I understand correctly, you'd like to use the same SQL for streaming and batch jobs in your platform. How about creating partitioned Sink tables for streaming jobs instead of Window?

Re: how to configure window of join operator in batch mode

2023-04-26 Thread Jiadong Lu
r scenario. Thank you for your time and I look forward to hearing from you soon. Best, Jiadong Lu On 2023/4/26 18:13, Shammon FY wrote: Hi Jiadong Using the process time window in Batch jobs may be a little strange for me. I prefer to partition the data according to the day level, and then

Re: how to configure window of join operator in batch mode

2023-04-26 Thread Shammon FY
Hi Jiadong Using the process time window in Batch jobs may be a little strange for me. I prefer to partition the data according to the day level, and then the Batch job reads data from different partitions instead of using Window. Best, Shammon FY On Wed, Apr 26, 2023 at 12:03 PM Jiadong Lu

Re: how to configure window of join operator in batch mode

2023-04-25 Thread Jiadong Lu
Hi, Shammon, Thank you for your reply. Yes, the window configured with `Time.days(1)` has no special meaning, it is just used to group all data into the same global window. I tried using `GlobalWindow` for this scenario, but `GlobalWindow` also need a `Trigger` like

Re: how to configure window of join operator in batch mode

2023-04-25 Thread Shammon FY
Hi Jiadong, I think it depends on the specific role of the window here for you. If this window has no specific business meaning and is only used for performance optimization, maybe you can consider to use join directly Best, Shammon FY On Tue, Apr 25, 2023 at 5:42 PM Jiadong Lu wrote: > He

how to configure window of join operator in batch mode

2023-04-25 Thread Jiadong Lu
Hello,everyone, I am confused about the window of join/coGroup operator in Batch mode. Here is my demo code, and it works fine for me at present. I wonder if this approach that using process time window in batch mode is appropriate? and does this approach have any problems? I want to use this

Issue with Incremental window aggregation using Aggregate function.

2023-04-21 Thread Sumanta Majumdar
rocksdb options and provided a good chunk of managed memory. We are able to read more than 15 records within a period of 4 mins which is our time window set based on our requirements. Now one optimization which I see is suggested through the flink docs in order to reduce the state size is to use

Re: Query on ProcessingTime Triggers on EventTime based window

2023-03-07 Thread Shammon FY
Hi I think you can give more detail such as example can help us to trace the cause, thanks Best, Shammon On Tue, Mar 7, 2023 at 5:31 PM Saurabh Singh via user wrote: > Hi Community, > > We have the below use case, > >- We have to use EventTime for Windowing (Tumbl

Query on ProcessingTime Triggers on EventTime based window

2023-03-07 Thread Saurabh Singh via user
Hi Community, We have the below use case, - We have to use EventTime for Windowing (Tumbling Window) and Watermarking. - We use *TumbingEventTimeWindows* for this - We have to continuously emit the results for Window every 1 minute. - We are planning to use

Re: Whether Flink SQL window operations support "Allow Lateness and SideOutput"?

2023-02-21 Thread Weihua Hu
One question as title: Whether Flink SQL window operations support > "Allow Lateness and SideOutput"? > > Just as supported in Datastream api (allowedLateness > and sideOutputLateData) like: > > SingleOutputStreamOperator<

Whether Flink SQL window operations support "Allow Lateness and SideOutput"?

2023-02-20 Thread wang
Hi dear engineers, One question as title: Whether Flink SQL window operations support "Allow Lateness and SideOutput"? Just as supported in Datastream api (allowedLateness and sideOutputLateData) like: SingleOutputStreamOperator<>sumStream = dataStream.ke

Re:Re: Unable to do event time window aggregation with Kafka source

2023-02-06 Thread wei_yuze
Hi Yuxia, Thanks for your reply! I expected the program to do reduce by key. It should count the number of data having the same username field. The program threw no exception, but it did not produce the expected output. Best regards, Lucas

Re: Unable to do event time window aggregation with Kafka source

2023-02-06 Thread yuxia
Hi, Lucas. What do you mean by saying "unable to do event time window aggregation with watermarkedStream"? What exception it will throw? Best regards, Yuxia 发件人: "wei_yuze" 收件人: "User" 发送时间: 星期二, 2023年 2 月 07日 下午 1:43:59 主题: Unable to do event time

Unable to do event time window aggregation with Kafka source

2023-02-06 Thread wei_yuze
Hello! I was unable to do event time window aggregation with Kafka source, but had no problem with "fromElement" source. The code is attached as follow. The code has two data sources, named "streamSource" and "kafkaSource" respectively. The program works wel

Multiple Window Streams to same Kinesis Sink

2023-01-27 Thread Curtis Jensen
I'm trying to sink two Window Streams to the same Kinesis Sink. When I do this, no results are making it to the sink (code below). If I remove one of the windows from the Job, results do get published. Adding another stream to the sink seems to void both. How can I have results from both W

RE: Does reduce function on keyed window gives any guarantee on the order of elements?

2022-11-04 Thread Qing Lim
That’s my understanding as well, thanks for your confirmation. From: Yanfei Lei Sent: 04 November 2022 16:03 To: Qing Lim Cc: User Subject: Re: Does reduce function on keyed window gives any guarantee on the order of elements? Hi Qing, > am I right to think that there will be 1 red

Re: Does reduce function on keyed window gives any guarantee on the order of elements?

2022-11-04 Thread Yanfei Lei
Best, Yanfei Qing Lim 于2022年11月3日周四 16:17写道: > Hi Yanfei > > Thanks for the explanation. > > > > If I use reduce in the context of keyed stream with window, am I right to > think that there will be 1 reduce function per key, and they will never > overlap? Each re

RE: Does reduce function on keyed window gives any guarantee on the order of elements?

2022-11-03 Thread Qing Lim
Hi Yanfei Thanks for the explanation. If I use reduce in the context of keyed stream with window, am I right to think that there will be 1 reduce function per key, and they will never overlap? Each reduce function instance will only receive elements from the same key in order. From: Yanfei Lei

Re: Does reduce function on keyed window gives any guarantee on the order of elements?

2022-11-02 Thread Yanfei Lei
Hi Qing, > Does it guarantee that it will be called in the same order of elements in the stream, where value2 is always 1 element after value1? Order is maintained within each parallel stream partition. If the reduce operator only has one sending- sub-task, the answer is YES, but if reduce operato

Does reduce function on keyed window gives any guarantee on the order of elements?

2022-11-02 Thread Qing Lim
Hi, Flink User Group I am trying to use Reduce function, I wonder does it guarantee order when its called? The signature is as follow: T reduce(T value1, T value2) throws Exception; Does it guarantee that it will be called in the same order of elements in the stream, where value2 is always 1

Re: Window state size with global window and custom trigger

2022-10-10 Thread Alexis Sarda-Espinosa
Thanks for the confirmation :) Regards, Alexis. On Sun, 9 Oct 2022, 10:37 Hangxiang Yu, wrote: > Hi, Alexis. > I think you are right. It also applies for a global window with a custom > trigger. > If you apply a ReduceFunction or AggregateFunction, the window state size > usu

Re: Window state size with global window and custom trigger

2022-10-09 Thread Hangxiang Yu
Hi, Alexis. I think you are right. It also applies for a global window with a custom trigger. If you apply a ReduceFunction or AggregateFunction, the window state size usually is smaller than applying ProcessWindowFunction due to the aggregated value. It also works for global windows. Of course

Window state size with global window and custom trigger

2022-10-07 Thread Alexis Sarda-Espinosa
Hello, I found an SO thread that clarifies some details of window state size [1]. I would just like to confirm that this also applies when using a global window with a custom trigger. The reason I ask is that the TriggerResult API is meant to cover all supported scenarios, so FIRE vs

Re: Global window in batch mode

2022-09-30 Thread Yunfeng Zhou
/datastream/EndOfStreamWindows.java The usage would be like: .keyBy(new MyKeySelector()) .window(EndOfStreamWindows.get()) .reduce(new MyReduceFunction()) Best, Yunfeng Zhou On Thu, Sep 29, 2022 at 9:36 PM Vararu, Vadim wrote: > Hi all, > > > > I need to configure a keyed global wi

Global window in batch mode

2022-09-29 Thread Vararu, Vadim
Hi all, I need to configure a keyed global window that would trigger a reduce function for all the events in each key group before the processing finishes and the job closes. I have something similar for the realtime(streaming) version of the job, configured with a processing time gap

Serialization in window contents and network buffers

2022-09-27 Thread Alexis Sarda-Espinosa
Hi everyone, I know the low level details of this are likely internal, but at a high level we can say that operators usually have some state associated with them. Particularly for error handling and job restarts, I imagine windows must persist state, and operators in general probably persist netwo

Re: get state from window

2022-08-17 Thread yuxia
Sorry for misleading. After some investigation, seems UDTAGG can only used in flink table spi. Best regards, Yuxia - 原始邮件 - 发件人: "yuxia" 收件人: "user-zh" 抄送: "User" 发送时间: 星期四, 2022年 8 月 18日 上午 10:21:12 主题: Re: get state from window > does flink s

Re: get state from window

2022-08-17 Thread yanfei lei
Hi, there are two methods on the Context object that a process() invocation receives that allows access to the two types of state: - globalState(), which allows access to keyed state that is not scoped to a window - windowState(), which allows access to keyed state that is also scoped

Re: get state from window

2022-08-17 Thread yuxia
> does flink sql support UDTAGG? Yes, Flink sql support UDTAGG. Best regards, Yuxia - 原始邮件 - 发件人: "曲洋" 收件人: "user-zh" , "User" 发送时间: 星期四, 2022年 8 月 18日 上午 10:03:24 主题: get state from window Hi dear engineers, I have one question: does flink stre

Flink SQL and tumble window by size (number of rows)

2022-08-02 Thread Marco Villalobos
Is it possible in Flink SQL to tumble a window by row size instead of time? Let's say that I want a window for every 1 rows for example using the Flink SQL API. is that possible? I can't find any documentation on how to do that, and I don't know if it is supported.

Re: Window aggregation fails after upgrading to Flink 1.15

2022-05-23 Thread Shengkai Fang
, userid, count(pageid) AS pgcnt FROM >> TABLE(TUMBLE(TABLE myTable, DESCRIPTOR(rowtime), INTERVAL '5' SECONDS)) >> WHERE (p_userid <> 'User_6') GROUP BY window_start, window_end, userid >> >> >> >> -- >> Best! >> Xuya

Re: Window aggregation fails after upgrading to Flink 1.15

2022-05-20 Thread Pouria Pirzadeh
AS pgcnt FROM TABLE > (TUMBLE(TABLE myTable, DESCRIPTOR(rowtime), INTERVAL '5' SECONDS)) WHERE > (p_userid <> 'User_6') GROUP BY window_start, window_end, userid > > > > -- > Best! > Xuyang > > > At 2022-05-19 08:53:03, "Pouria Pi

Window aggregation fails after upgrading to Flink 1.15

2022-05-18 Thread Pouria Pirzadeh
I am running a Flink application in Java that performs window aggregation. The query runs successfully on Flink 1.14.4. However, after upgrading to Flink 1.15.0 and switching the code to use Windowing TVF, it fails with a runtime error as planner can not compile and instantiate window Aggs Handler

Re: [State Processor API] unable to load the state of a trigger attached to a session window

2022-05-12 Thread Dongwon Kim
Hi Juntao, Thanks a lot for taking a look at this. After a little inspection, I found that elements (window state) are stored > in namespace TimeWindow{start=1,end=11}, in your case, and trigger count > (trigger state) is stored in namespace TimeWindow{start=1,end=15}, but > WindowReade

Re: [State Processor API] unable to load the state of a trigger attached to a session window

2022-05-12 Thread Juntao Hu
Sorry to make the previous mail private. My response reposted here: " After a little inspection, I found that elements (window state) are stored in namespace TimeWindow{start=1,end=11}, in your case, and trigger count (trigger state) is stored in namespace TimeWindow{start=1,end=15}

Re: [State Processor API] unable to load the state of a trigger attached to a session window

2022-05-10 Thread Dongwon Kim
I'm using Flink-1.14.4 and failed to load in WindowReaderFunction the >> state of a stateful trigger attached to a session window. >> I found that the following data become available in WindowReaderFunction: >> - the state defined in the ProcessWindowFunction >> - the

RE: Notify on 0 events in a Tumbling Event Time Window

2022-05-10 Thread Schwalbe Matthias
Hi Shilpa, There is no need to have artificial messages in the input kafka topic (and I don’t see where Andrew suggests this 😊 ) However your use case is not 100% clear as to for which keys you want to emit 0-count window results , either: * A) For all keys your job has ever seen (that’s

Re: Notify on 0 events in a Tumbling Event Time Window

2022-05-09 Thread Shilpa Shankar
Thanks Andrew. We did consider this solution too. Unfortunately we do not have permissions to generate artificial kafka events in our ecosystem. Dario, Thanks for your inputs. We will give your design a try. Due the number of events being processed per window, we are using incremental aggregate

Re: Notify on 0 events in a Tumbling Event Time Window

2022-05-09 Thread Dario Heinisch
It depends on the user case,  in Shilpa's use case it is about users so the user ids are probably know beforehand. https://dpaste.org/cRe3G <= This is an example with out an window but essentially Shilpa you would be reregistering the timers every time they fire. You would also have t

Re: Notify on 0 events in a Tumbling Event Time Window

2022-05-09 Thread Andrew Otto
sting into Hive, we filter out the canary events. The ingestion code has work to do and can mark an hour as complete, but still end up writing no events to it. Perhaps you could do the same? Always emit artificial events, and filter them out in your windowing code? The window should still fir

Notify on 0 events in a Tumbling Event Time Window

2022-05-09 Thread Shilpa Shankar
a solution to do the same. The options we considered are below, please let us know if there are other ideas we haven't looked into. [1] Querable State : Save the keys in each of the Process Window Functions. Query the state from an external application and alert when a key is missing afte

Re: [State Processor API] unable to load the state of a trigger attached to a session window

2022-04-24 Thread Dongwon Kim
Can anyone help me with this? Thanks in advance, On Tue, Apr 19, 2022 at 4:28 PM Dongwon Kim wrote: > Hi, > > I'm using Flink-1.14.4 and failed to load in WindowReaderFunction the > state of a stateful trigger attached to a session window. > I found that the following data

[State Processor API] unable to load the state of a trigger attached to a session window

2022-04-19 Thread Dongwon Kim
Hi, I'm using Flink-1.14.4 and failed to load in WindowReaderFunction the state of a stateful trigger attached to a session window. I found that the following data become available in WindowReaderFunction: - the state defined in the ProcessWindowFunction - the registered timers of the sta

Re: UnsupportedOperationException: The end timestamp of a processing-time window cannot become earlier than the current processing time by merging.

2022-04-15 Thread Jai Patel
I dug into this further and I no longer suspect the window describe previously as it does not leverage MergeableWindowAssigner. However, I did identify four in our code that do. They all level ProcessingTimeSessionWindows.withGap. 3 of them use a 500ms gap, while oe uses a 100ms gap. Based on

Re: UnsupportedOperationException: The end timestamp of a processing-time window cannot become earlier than the current processing time by merging.

2022-04-15 Thread Jai Patel
Here's our custom trigger. We thought about switching to ProcessingTimeoutTrigger.of(CountTrigger.of(100, Time.ofMinutes(1)). But I'm not sure that'll trigger properly when the window closes. Thanks. Jai import org.apache.flink.streaming.api.windowing.triggers.Count

UnsupportedOperationException: The end timestamp of a processing-time window cannot become earlier than the current processing time by merging.

2022-04-15 Thread Jai Patel
We are encountering the following error when running our Flink job. We have several processing windows, but it appears to be related to a TumblingProcessingTimeWindow. Checkpoints are failing to complete midway. The code block for the window is: .keyBy(order -> getKey(order)) .win

Re: Tumbling window apply will not "fire"

2022-01-31 Thread John Smith
ompleteness > and Idid a cut and paste lol > > Ok let you know. > > On Mon, 31 Jan 2022 at 17:18, Dario Heinisch > wrote: > >> Then you should be using a process based time window, in your case: >> TumblingProcessingTimeWindows >> >> See >> https://ni

Re: Tumbling window apply will not "fire"

2022-01-31 Thread John Smith
ime window, in your case: > TumblingProcessingTimeWindows > > See > https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/operators/windows/ > for more info > On 31.01.22 23:13, John Smith wrote: > > Hi Dario, I don't care about event time I just want

Re: Tumbling window apply will not "fire"

2022-01-31 Thread Dario Heinisch
Then you should be using a process based time window, in your case: TumblingProcessingTimeWindows See https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/operators/windows/ for more info On 31.01.22 23:13, John Smith wrote: Hi Dario, I don't care about event time I

Re: Tumbling window apply will not "fire"

2022-01-31 Thread John Smith
Hi Dario, I don't care about event time I just want to do tumbling window over the "processing time" I.e: count whatever I have in the last 5 minutes. On Mon, 31 Jan 2022 at 17:09, Dario Heinisch wrote: > Hi John > > This is because you are using event time (TumblingEve

Re: Tumbling window apply will not "fire"

2022-01-31 Thread Dario Heinisch
"Kafka Source") .uid(kafkaTopic).name(kafkaTopic) .setParallelism(1) .flatMap(new MapToMyEvent("my-event", "message")) // <--- This works .uid("map-json-logs").name("map-json-logs"); slStream.keyBy(n

Tumbling window apply will not "fire"

2022-01-31 Thread John Smith
ToMyEvent("my-event", "message")) // <--- This works .uid("map-json-logs").name("map-json-logs"); slStream.keyBy(new MinutesKeySelector(windowSizeMins)) // <--- This prints twice .window(TumblingEventTimeWindows.of(Time.min

Re: Window function - flush on job stop

2022-01-24 Thread Caizhi Weng
ource detects this, it sends out a record with a very large watermark to cut off the session. Lars Skjærven 于2022年1月21日周五 20:01写道: > We're doing a stream.keyBy().window().aggregate() to aggregate customer > feedback into sessions. Every now and then we have to update the job, e.g. >

Window function - flush on job stop

2022-01-21 Thread Lars Skjærven
We're doing a stream.keyBy().window().aggregate() to aggregate customer feedback into sessions. Every now and then we have to update the job, e.g. change the key, so that we can't easlily continue from the previous state. Cancelling the job (without restarting from last savepoint) will

Re: Window Top N for Flink 1.12

2021-12-23 Thread Jing Zhang
Hi Jing, Please try this way, Only create one sink for final output, write the window aggregate and topN in one query, write the result of topN into the final sink. Best, Jing Zhang Jing 于2021年12月24日周五 03:13写道: > Hi Jing Zhang, > > Thanks for the reply! My current implementation is

Re: Window Top N for Flink 1.12

2021-12-23 Thread Jing
TRING, channel_id STRING, window_end BIGINT, num_select BIGINT) " ) This also doesn't work. Thanks, Jing On Thu, Dec 23, 2021 at 1:20 AM Jing Zhang wrote: > Hi Jing, > In fact, I agree with you to use TopN [2] instead of Window TopN[1] by > normalizing > time into a unit

Re: Re: Re: Re:Re: Window Aggregation and Window Join ability not work properly

2021-12-23 Thread Martijn Visser
> > *Sender:*Yun Gao > *Send Date:*Thu Dec 23 17:05:33 2021 > *Recipients:*cy > *CC:*'user@flink.apache.org' > *Subject:*Re: Re:Re: Window Aggregation and Window Join ability not work > properly > >> Hi Caiyi, >> >> I think if the

Re: Window Top N for Flink 1.12

2021-12-23 Thread Jing Zhang
Hi Jing, In fact, I agree with you to use TopN [2] instead of Window TopN[1] by normalizing time into a unit with 5 minute, and add it to be one of partition keys. Please note two points when use TopN 1. the result is an update stream instead of append stream, which means the result sent might be

Re:Re: Re: Re:Re: Window Aggregation and Window Join ability not work properly

2021-12-23 Thread cy
inal Mail -- Sender:Yun Gao Send Date:Thu Dec 23 17:05:33 2021 Recipients:cy CC:'user@flink.apache.org' Subject:Re: Re:Re: Window Aggregation and Window Join ability not work properly Hi Caiyi, I think if the image shows all the records, after the change we should on

Re: Re: Re:Re: Window Aggregation and Window Join ability not work properly

2021-12-23 Thread Yun Gao
Sorry I mean 16:00:05, but it should be similar. --Original Mail -- Sender:Yun Gao Send Date:Thu Dec 23 17:05:33 2021 Recipients:cy CC:'user@flink.apache.org' Subject:Re: Re:Re: Window Aggregation and Window Join ability not work properly Hi Caiyi,

Re: Re:Re: Window Aggregation and Window Join ability not work properly

2021-12-23 Thread Yun Gao
Hi Caiyi, I think if the image shows all the records, after the change we should only have the watermark at 16:05, which is still not be able to trigger the window of 5 minutes? Best, Yun --Original Mail -- Sender:cy Send Date:Thu Dec 23 15:44:23 2021

Re: Window Top N for Flink 1.12

2021-12-23 Thread Jing Zhang
Hi Jing, I'm afraid there is no possible to Window TopN in SQL on 1.12 version because window TopN is introduced since 1.13. > I saw the one possibility is to create a table and insert the aggregated data to the table, then do top N like [1]. However, I cannot make this approach work b

Window Top N for Flink 1.12

2021-12-23 Thread Jing
Hi, Flink community, Is there any existing code I can use to get the window top N with Flink 1.12? I saw the one possibility is to create a table and insert the aggregated data to the table, then do top N like [1]. However, I cannot make this approach work because I need to specify the connector

Re:Re: Window Aggregation and Window Join ability not work properly

2021-12-22 Thread cy
I change to watermark for `datatime` as `datatime` - interval '1' second or watermark for `datatime` as `datatime` but is still not work. At 2021-12-23 15:16:20, "Yun Gao" wrote: Hi Caiyi, The window need to be finalized with watermark[1]. I noticed that th

Re: Window Aggregation and Window Join ability not work properly

2021-12-22 Thread Yun Gao
Hi Caiyi, The window need to be finalized with watermark[1]. I noticed that the watermark defined is `datatime` - INTERVAL '5' MINUTE, it means the watermark emitted would be the maximum observed timestamp so far minus 5 minutes [1]. Therefore, if we want to trigger the window of

Re: PyFlink SQL window aggregation data not written out to file

2021-11-22 Thread Guoqin Zheng
Hi Roman, Thanks for the update and testing locally. This is very informative. -Guoqin On Mon, Nov 22, 2021 at 10:55 AM Roman Khachatryan wrote: > Hi Guoqin, > > I was able to reproduce the problem locally. I can see that at the > time of window firing the services are already cl

Re: PyFlink SQL window aggregation data not written out to file

2021-11-22 Thread Roman Khachatryan
Hi Guoqin, I was able to reproduce the problem locally. I can see that at the time of window firing the services are already closed because of the emitted MAX_WATERMARK. Previously, there were some discussions around waiting for all timers to complete [1], but AFAIK there was not much demand to

Re: PyFlink SQL window aggregation data not written out to file

2021-11-17 Thread Guoqin Zheng
> Hi Guoqin, > > Thanks for the clarification. > > Processing time windows actually don't need watermarks: they fire when > window end time comes. > But the job will likely finish earlier because of the bounded input. > > Handling of this case was improved in 1.14

Re: PyFlink SQL window aggregation data not written out to file

2021-11-17 Thread Roman Khachatryan
Hi Guoqin, Thanks for the clarification. Processing time windows actually don't need watermarks: they fire when window end time comes. But the job will likely finish earlier because of the bounded input. Handling of this case was improved in 1.14 as part of FLIP-147, as well as in pre

Re: PyFlink SQL window aggregation data not written out to file

2021-11-15 Thread Guoqin Zheng
Hi Roman, Thanks for your quick response! Yes, it does seem to be the window closing problem. So if I change the tumble window on eventTime, which is column 'requestTime', it works fine. I guess the EOF of the test data file kicks in a watermark of Long.MAX_VALUE. But my application

Re: PyFlink SQL window aggregation data not written out to file

2021-11-15 Thread Roman Khachatryan
do a local test of my pyflink sql application. The sql > query does a very simple job: read from source, window aggregation and write > result to a sink. > > In the production, the source and sink will be kinesis stream. But since I > need to do a local test, I am mocking out t

PyFlink SQL window aggregation data not written out to file

2021-11-15 Thread Guoqin Zheng
d from source, window aggregation and write result to a sink. In the production, the source and sink will be kinesis stream. But since I need to do a local test, I am mocking out the source and sink with the local filesystem. The problem is the local job always runs successfully, but the results were

Re: JVM cluster not firing event time window

2021-11-10 Thread Caizhi Weng
new WatermarkStrategy[Int] { > override def createWatermarkGenerator(context: > WatermarkGeneratorSupplier.Context): WatermarkGenerator[Int] = new > AscendingTimestampsWatermarks[Int] > > override def createTimestampAssigner(context: > TimestampAssignerSupplier.Context):

  1   2   3   4   5   6   7   8   9   10   >