TableAggregation, FlinkSql - Top2 example

2025-03-14 Thread Guillermo
I'm trying to execute the example to AggregationTable ( https://nightlies.apache.org/flink/flink-docs-release-1.20/docs/dev/table/functions/udfs/#table-aggregate-functions) from FlinkSQL and I don't get it. I have copied the example, compiled and created the function. When I try to execute it: Fl

Re: FlinkSQL LAG, strange behavior?

2025-03-12 Thread Guillermo Ortiz Fernández
single block, resulting in a "curious" behavior. El jue, 20 feb 2025 a las 15:56, Guillermo Ortiz Fernández (< guillermo.ortiz.f...@gmail.com>) escribió: > And another option I tried it's: > > WITH ranked AS ( > select * > FROM ( > SELECT >

Re: Re: Performance problem with FlinkSQL

2025-02-21 Thread Guillermo Ortiz Fernández
kpressure, the calc operator could be the bottleneck, indicating that > the filtering logic is CPU-intensive). > > 3. > > ``` > > and the process has been running for several days without failures, > although with *high CPU and memory usage*. > > ``` > > If CPU

Re: FlinkSQL LAG, strange behavior?

2025-02-20 Thread Guillermo Ortiz Fernández
ption: StreamPhysicalOverAggregate doesn't support consuming update and delete changes which is produced by node Deduplicate(keep=[FirstRow], key=[msisdn, ts], order=[ROWTIME]) El jue, 20 feb 2025 a las 12:00, Guillermo Ortiz Fernández (< guillermo.ortiz.f...@gmail.com>) escribió: > Another opt

Re: Performance problem with FlinkSQL

2025-02-20 Thread Guillermo Ortiz Fernández
ot;task" > DataStream I'm sure you can do it. > > > > Att, > Pedro Mázala > +31 (06) 3819 3814 > Be awesome > > > On Wed, 29 Jan 2025 at 22:06, Guillermo Ortiz Fernández < > guillermo.ortiz.f...@gmail.com> wrote: > >> Aft

Re: FlinkSQL LAG, strange behavior?

2025-02-20 Thread Guillermo Ortiz Fernández
Another option would be add an extra field like: ARRAY_REMOVE(IFNULL(LAG(zoneIds, 1) OVER (PARTITION BY msisdn ORDER BY *ts, row_number*), ARRAY[-1]), -1) AS prev_zoneIds, But it isn't possible either. El jue, 20 feb 2025 a las 11:48, Guillermo Ortiz Fernández (< guillermo

Re: FlinkSQL LAG, strange behavior?

2025-02-20 Thread Guillermo Ortiz Fernández
figured out. > > Greetings, > Alessio > > On Thu, Feb 20, 2025 at 10:31 AM Guillermo Ortiz Fernández < > guillermo.ortiz.f...@gmail.com> wrote: > >> I have created a table that reads from a Kafka topic. What I want to do >> is order the data by eventTime an

FlinkSQL LAG, strange behavior?

2025-02-20 Thread Guillermo Ortiz Fernández
I have created a table that reads from a Kafka topic. What I want to do is order the data by eventTime and add a new field that represents the previous value using the LAG function. The problem arises when two records have exactly the same eventTime, which produces a "strange" behavior. CREATE T

Re: Performance problem with FlinkSQL

2025-01-29 Thread Guillermo Ortiz Fernández
After last checking it uses about 200-400 millicores each pod and 2.2Gb. El mié, 29 ene 2025 a las 21:41, Guillermo Ortiz Fernández (< guillermo.ortiz.f...@gmail.com>) escribió: > I have a job entirely written in Flink SQL. The first part of the program > processes 10 input topics a

Performance problem with FlinkSQL

2025-01-29 Thread Guillermo Ortiz Fernández
I have a job entirely written in Flink SQL. The first part of the program processes 10 input topics and generates one output topic with normalized messages and some filtering applied (really easy, some where by fields and substring). Nine of the topics produce between hundreds and thousands of mess

Issues reading kafka avro records with ARRAY field from FlinkSQL table.

2025-01-14 Thread Guillermo Ortiz Fernández
I have created the example table to read data that follows the schema defined in example.avro. > *CREATE TABLE example (* * fieldA STRING,* * fieldB INT,* * arrayField ARRAY* *) WITH (* *'connector' = 'kafka',* *'topic' = 'test_flink',* *'properties.bootstrap.servers' = '***

Resource consumption in idle tasks

2025-01-02 Thread Guillermo
Let's suppose we have two topics, one with 20 partitions and another with 1 partition. If we set parallelism.default to 10, I understand that it will create 10 subtasks in total for each source. In the case of the topic with 20 partitions, it will work correctly, but for the topic with only one par

Re: FlinkSQL, define parallelism as HINT or CREATE

2024-12-27 Thread Guillermo Ortiz Fernández
afka connector and using dynamic table option hints[1] to achieve your > requirements. > > [1] > https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/hints/#dynamic-table-options > > > -- > Best! > Xuyang > > > At 2024-12-26 20:2

Re: Re: Flink and default parallelims, problems with state functions

2024-12-26 Thread Guillermo Ortiz Fernández
he issue lies with the Kafka source not > reading any data, rather than being related to the stateful over agg with > expression "ARRAY_REMOVE(IFNULL(LAG(zoneIds, 1) OVER (PARTITION BY code > ORDER BY ts), ARRAY[-1]), -1) AS prev_zoneIds.". > > > -- > Best!

FlinkSQL, define parallelism as HINT or CREATE

2024-12-26 Thread Guillermo Ortiz Fernández
I'm using Flink SQL and have several Kafka topics with different partition counts, ranging from 4 to 320 partitions. Is it possible to specify the level of parallelism as a HINT in queries or CREATE statements? If I don't define any, it defaults to parallelism.default. However, since the entire pro

Re: Flink and default parallelims, problems with state functions

2024-12-20 Thread Guillermo Ortiz Fernández
> progress of over agg. Please check if the watermark is not progressing (or > perhaps it is stuck at Long.Min). > > > *Solutions*: > > 1. (Recommended) Set `table.exec.source.idle-timeout`[1] to ignore > alignment for subtasks that have not emitted a watermark af

Flink parallelism with Kafka source

2024-12-20 Thread Guillermo Ortiz Fernández
I'm looking for how Flink defines parallelism for a Kafka source ( https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/kafka/). How is it determined by default? Is it based on the number of partitions in the topic? I have some topics with hundreds of partitions, and such a hi

Flink and default parallelims, problems with state functions

2024-12-19 Thread Guillermo Ortiz Fernández
I am working on a program using Flink SQL. I initially developed it in a Docker Compose environment where I run Flink and Kafka, and everything worked correctly. However, when I deployed it to another environment, it stopped working. The difference I noticed in the new environment, which only runs

Re: Trying to connect to HBase from FlinkSql

2024-11-13 Thread Guillermo
lib/flink-connector-hbase-2.2-1.17.2.jar Any idea? El mié, 13 nov 2024 a las 5:07, Shengkai Fang () escribió: > Hi. > > You should put the hbase jar into the ${FLINK-HOMe}/lib directory. > > Best, > Shengkai > > Guillermo 于2024年11月12日周二 19:28写道: > >> I'm using

Re: Trying to connect to HBase from FlinkSql

2024-11-12 Thread Guillermo
I'm using Flink 1.20 because I need some features from this version, I have seen that there're not available the HBase connector in this version, I don't know if this is the problem or it could be compatible. El mar, 12 nov 2024 a las 11:56, Guillermo () escribió: > I'm

Trying to connect to HBase from FlinkSql

2024-11-12 Thread Guillermo
I'm trying to connect HBase 1.2 with FlinkSQL, (I have tried with HBase 2.2 and the flink connector 2.2 with the same result). Test is done with docker-compose and hbase. Do I have to do something else? I have checked that table it's okay in HBase. root@70e2d4767504:/opt/flink/lib# ls flink-cep-1.

Order data with OVER AGGREGATION functions.

2024-11-11 Thread Guillermo
I am running several queries in FlinkSQL, and in a final step before inserting into Kafka, I perform an ORDER BY eventTime. When I look at the execution plan, I see Exchange(distribution=[single]). Does this mean that all the data is going to a single node and getting reordered there? I haven't bee

Re: Re:FlinkSQL Hints - Boradcast table.

2024-11-04 Thread Guillermo Ortiz Fernández
We are trying to migrate a kafka streams applications to FlinkSql. Kafka Streams app uses GKTables to avoid shuffles for the lookup tables. Is there any option to Flink? El lun, 4 nov 2024 a las 11:27, Guillermo Ortiz Fernández (< guillermo.ortiz.f...@gmail.com>) escribió: > The small

Re: Re:FlinkSQL Hints - Boradcast table.

2024-11-04 Thread Guillermo Ortiz Fernández
04 17:54:36,"Xuyang" 写道: > > Hi, > >The BROADCAST[1] join hint currently applies only to batch mode. > > > [1] > https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/hints/#broadcast > [1] > > > -- > Best! >

FlinkSQL Hints - Boradcast table.

2024-11-04 Thread Guillermo Ortiz Fernández
Hi, I'm running a simple query that joins two tables, where one table is much larger than the other, with the second table being very small. I believe it would be optimal to use a broadcast on the second table for the join. All my tests are being done locally, with very little data in either table

Trying to understand watermark in join with FlinkSQL and late events

2024-10-30 Thread Guillermo
I'm trying to understand how watermarks work in FlinkSQL. I’ve created the following tables: CREATE TABLE currency_rates ( currency STRING, conversion_rate STRING, update_time TIMESTAMP(3), WATERMARK FOR update_time AS update_time, PRIMARY KEY(currency) NOT ENFO