Re: Performance problem with FlinkSQL

2025-01-29 Thread Guillermo Ortiz Fernández
After last checking it uses about 200-400 millicores each pod and 2.2Gb. El mié, 29 ene 2025 a las 21:41, Guillermo Ortiz Fernández (< guillermo.ortiz.f...@gmail.com>) escribió: > I have a job entirely written in Flink SQL. The first part of the program > processes 10 input topics a

Performance problem with FlinkSQL

2025-01-29 Thread Guillermo Ortiz Fernández
I have a job entirely written in Flink SQL. The first part of the program processes 10 input topics and generates one output topic with normalized messages and some filtering applied (really easy, some where by fields and substring). Nine of the topics produce between hundreds and thousands of mess

Issues reading kafka avro records with ARRAY field from FlinkSQL table.

2025-01-14 Thread Guillermo Ortiz Fernández
I have created the example table to read data that follows the schema defined in example.avro. > *CREATE TABLE example (* * fieldA STRING,* * fieldB INT,* * arrayField ARRAY* *) WITH (* *'connector' = 'kafka',* *'topic' = 'test_flink',* *'properties.bootstrap.servers' = '***

Re: FlinkSQL, define parallelism as HINT or CREATE

2024-12-27 Thread Guillermo Ortiz Fernández
afka connector and using dynamic table option hints[1] to achieve your > requirements. > > [1] > https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/hints/#dynamic-table-options > > > -- > Best! > Xuyang > > > At 2024-12-26 20:2

Re: Re: Flink and default parallelims, problems with state functions

2024-12-26 Thread Guillermo Ortiz Fernández
he issue lies with the Kafka source not > reading any data, rather than being related to the stateful over agg with > expression "ARRAY_REMOVE(IFNULL(LAG(zoneIds, 1) OVER (PARTITION BY code > ORDER BY ts), ARRAY[-1]), -1) AS prev_zoneIds.". > > > -- > Best!

FlinkSQL, define parallelism as HINT or CREATE

2024-12-26 Thread Guillermo Ortiz Fernández
I'm using Flink SQL and have several Kafka topics with different partition counts, ranging from 4 to 320 partitions. Is it possible to specify the level of parallelism as a HINT in queries or CREATE statements? If I don't define any, it defaults to parallelism.default. However, since the entire pro

Re: Flink and default parallelims, problems with state functions

2024-12-20 Thread Guillermo Ortiz Fernández
> progress of over agg. Please check if the watermark is not progressing (or > perhaps it is stuck at Long.Min). > > > *Solutions*: > > 1. (Recommended) Set `table.exec.source.idle-timeout`[1] to ignore > alignment for subtasks that have not emitted a watermark af

Flink parallelism with Kafka source

2024-12-20 Thread Guillermo Ortiz Fernández
I'm looking for how Flink defines parallelism for a Kafka source ( https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/kafka/). How is it determined by default? Is it based on the number of partitions in the topic? I have some topics with hundreds of partitions, and such a hi

Flink and default parallelims, problems with state functions

2024-12-19 Thread Guillermo Ortiz Fernández
I am working on a program using Flink SQL. I initially developed it in a Docker Compose environment where I run Flink and Kafka, and everything worked correctly. However, when I deployed it to another environment, it stopped working. The difference I noticed in the new environment, which only runs

Re: Re:FlinkSQL Hints - Boradcast table.

2024-11-04 Thread Guillermo Ortiz Fernández
We are trying to migrate a kafka streams applications to FlinkSql. Kafka Streams app uses GKTables to avoid shuffles for the lookup tables. Is there any option to Flink? El lun, 4 nov 2024 a las 11:27, Guillermo Ortiz Fernández (< guillermo.ortiz.f...@gmail.com>) escribió: > The small

Re: Re:FlinkSQL Hints - Boradcast table.

2024-11-04 Thread Guillermo Ortiz Fernández
04 17:54:36,"Xuyang" 写道: > > Hi, > >The BROADCAST[1] join hint currently applies only to batch mode. > > > [1] > https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/hints/#broadcast > [1] > > > -- > Best! >

FlinkSQL Hints - Boradcast table.

2024-11-04 Thread Guillermo Ortiz Fernández
Hi, I'm running a simple query that joins two tables, where one table is much larger than the other, with the second table being very small. I believe it would be optimal to use a broadcast on the second table for the join. All my tests are being done locally, with very little data in either table