Re:How can we read checkpoint data for debugging state

2025-02-20 Thread Xuyang
Hi, FYI that hopes helpful: FLIP-496: SQL connector for keyed savepoint data[1] [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-496%3A+SQL+connector+for+keyed+savepoint+data -- Best! Xuyang At 2025-02-21 12:11:02, "Sachin Mittal" wrote: Hi, So I have a flink ap

Re: FlinkSQL LAG, strange behavior?

2025-02-20 Thread Guillermo Ortiz Fernández
Another option would be add an extra field like: ARRAY_REMOVE(IFNULL(LAG(zoneIds, 1) OVER (PARTITION BY msisdn ORDER BY *ts, row_number*), ARRAY[-1]), -1) AS prev_zoneIds, But it isn't possible either. El jue, 20 feb 2025 a las 11:48, Guillermo Ortiz Fernández (< guillermo.ortiz.f...@gmail.c

Question about large windows and RocksDB state backend

2025-02-20 Thread Gabriele Mencagli
Dear Flink Community, I am working with an application developed using the DataStream API. The application has a window operator using with a large temporal window. Due to the processing done, we are using the ProcessWindowFunction implementation as we need to explicitly buffer all the tuples

Re: FlinkSQL LAG, strange behavior?

2025-02-20 Thread Alessio Bernesco Làvore
Hello, don't know if it's strictly related, but using a TIMESTAMP_LTZ vs. TIMESTAMP ts i had some not understandable behaviors. Changing the ts to TIMESTAMP (not LTZ) the identical query works as expected. So, first all, we changed all the parsing to TIMESTAMP to exclude this kind of problems. If

Re: FlinkSQL LAG, strange behavior?

2025-02-20 Thread Guillermo Ortiz Fernández
I tried to create a additional field based on ROW_NUMBER but it can't be used in LAG function. The idea was: WITH ranked AS ( SELECT msisdn, eventTimestamp, zoneIds, ts, TO_TIMESTAMP_LTZ(ROW_NUMBER() OVER (PARTITION BY msisdn ORDER BY ts), 3) AS ts_ranke

FlinkSQL LAG, strange behavior?

2025-02-20 Thread Guillermo Ortiz Fernández
I have created a table that reads from a Kafka topic. What I want to do is order the data by eventTime and add a new field that represents the previous value using the LAG function. The problem arises when two records have exactly the same eventTime, which produces a "strange" behavior. CREATE T

Re:Re: Performance problem with FlinkSQL

2025-02-20 Thread Xuyang
Hi, There can be various reasons for performance issues, such as: 1. Bottlenecks in the Kafka connector when reading data 2. Time-consuming filter conditions 3. Bottlenecks in the Kafka connector when writing data To further diagnose the problem, you can try: 1. Set the Flink configuration `

How can we read checkpoint data for debugging state

2025-02-20 Thread Sachin Mittal
Hi, So I have a flink application which stores state (RocksDB state backend) in S3 with the following directory structure: s3://{bucket}/flink-checkpoints//{job-id}|+ --shared/+ --taskowned/+ --chk-/ I have my job pipeline defined like: final DataStream e

Re: How can we read checkpoint data for debugging state

2025-02-20 Thread Sachin Mittal
Hi, I am working on Flink 1.19.1, so I guess I cannot use the SQL connector as that's to be released for 2.1.0 If I directly try to use the flink-state-processor-api, my very first question is how do I know the uuid for each keyed state ? Is it the step name? as in events .keyBy(new MyKeySelector(

SASL_SSL snik kafka

2025-02-20 Thread nick toker
Hello, We are encountering a connection issue with our Kafka sink when using AWS MSK IAM authentication. While our Kafka source connects successfully, the sink fails to establish a connection. Here's how we're building the sink: ```java KafkaSinkBuilder builder = KafkaSink.builder() .setBootst

Re: FlinkSQL LAG, strange behavior?

2025-02-20 Thread Guillermo Ortiz Fernández
And another option I tried it's: WITH ranked AS ( select * FROM ( SELECT msisdn, eventTimestamp, zoneIds, ts, ROW_NUMBER() OVER (PARTITION BY msisdn,ts ORDER BY ts) AS rownum FROM example) WHERE rownum = 1 ) SELECT msisdn, eventTimestam

Re: Performance problem with FlinkSQL

2025-02-20 Thread Guillermo Ortiz Fernández
I have been configuring the Flink cluster (application mode) to process the Kafka data volume. The current configuration consists of 16 pods, each with *12GB of memory and 2 CPUs*. Each TaskManager has *4 slots*. All processing is done using *Flink SQL*, and since Kafka topics may contain out-of-o