Re:Re:Re: Re:FlinkSQL Hints - Boradcast table.

2024-11-04 Thread Xuyang
Maybe you need to use a lookup table[1] for this small side. In that case, the probe side(the big table side) will not be shuffled. However, lookup join only supports processing time, and the changelog on the small table side will not be captured... [1] https://nightlies.apache.org/flink/fl

Re: Re:FlinkSQL Hints - Boradcast table.

2024-11-04 Thread Shengkai Fang
I think you may need to dump the upsert-kafka data to some storage that accepts cdc data, e.g. paimon or hudi. Then look up the data in these data lake storage. But Flink SQL doesn't support event time lookup join. Best, Shengkai

Re:Re: Re:FlinkSQL Hints - Boradcast table.

2024-11-04 Thread Xuyang
Hi. The Upsert-Kafka, as far as I know, does not implement a lookup table interface. Therefore, what you’re describing resembles a temporal join[1]. Similar to the stream join in stream processing, it currently cannot replace the shuffle hash edge with a broadcast. By the way, this issue[2

Re: Re:FlinkSQL Hints - Boradcast table.

2024-11-04 Thread Guillermo Ortiz Fernández
We are trying to migrate a kafka streams applications to FlinkSql. Kafka Streams app uses GKTables to avoid shuffles for the lookup tables. Is there any option to Flink? El lun, 4 nov 2024 a las 11:27, Guillermo Ortiz Fernández (< guillermo.ortiz.f...@gmail.com>) escribió: > The small table use u

Re: Re:FlinkSQL Hints - Boradcast table.

2024-11-04 Thread Guillermo Ortiz Fernández
The small table use upsert-kafka and doesn't support lookup table, do you know another possibility? Thanks. El lun, 4 nov 2024 a las 11:02, Xuyang () escribió: > Additionally, does the lookup table with CACHE[1][2] meet your needs? If > so, you might need to use or implement a dimension table con

Re:Re:FlinkSQL Hints - Boradcast table.

2024-11-04 Thread Xuyang
Additionally, does the lookup table with CACHE[1][2] meet your needs? If so, you might need to use or implement a dimension table connector with cache. [1] https://issues.apache.org/jira/browse/FLINK-28415 [2] https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/connectors/table/j

Re:FlinkSQL Hints - Boradcast table.

2024-11-04 Thread Xuyang
Hi, The BROADCAST[1] join hint currently applies only to batch mode. [1] https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/hints/#broadcast[1] -- Best! Xuyang At 2024-11-04 17:06:59, "Guillermo Ortiz Fernández" wrote: Hi, I'm running a s

FlinkSQL Hints - Boradcast table.

2024-11-04 Thread Guillermo Ortiz Fernández
Hi, I'm running a simple query that joins two tables, where one table is much larger than the other, with the second table being very small. I believe it would be optimal to use a broadcast on the second table for the join. All my tests are being done locally, with very little data in either table