Hi, casel.
一般情况下,维表侧也都会尽量做下推的。
比如对于:
```
SELECT * FROM MyTable AS T
JOIN LookupTable FOR SYSTEM_TIME AS OF T.proctime AS D
ON T.a = D.id AND D.age = 10
WHERE T.c > 1000
```
会优化成
```
Calc(select=[a, b, c, PROCTIME_MATERIALIZE(proctime) AS proctime, rowtime, id,
name, CAST(10 AS INTEGER) AS age])
+- LookupJoin(table=[default_catalog.default_database.LookupTable],
joinType=[InnerJoin], lookup=[age=10, id=a], where=[(age = 10)], select=[a, b,
c, proctime, rowtime, id, name])
+- Calc(select=[a, b, c, proctime, rowtime], where=[(c > 1000)])
+- DataStreamScan(table=[[default_catalog, default_database, MyTable]],
fields=[a, b, c, proctime, rowtime])
```
可以看到age = 10也会作为lookup join的key去维表查询。
可以把你的plan贴出来看看吗?
--
Best!
Xuyang
在 2024-11-06 19:47:21,"Hongshun Wang" <[email protected]> 写道:
> 谓词下推取决于connector实现
>
>On Tue, Nov 5, 2024 at 2:48 PM casel.chen <[email protected]> wrote:
>
>> 场景是使用flink
>> sql流表lookup关联维表(一对多,部分数据会出现一条关联出上千条结果)后只取其中一条,这会导致维表查询压力非常大,像这种有没有办法在flink
>> sql层面进行谓词下推?
>> 从flink UI上看该lookup join算子的输出数据量是输入数据量的几百倍