Hi, Venkata Thanks for reporting this issue. Currently, Flink doesn't support nested filter pushdown. I also think that this optimization would be useful, especially for jobs, which may need to read a lot of data from the parquet or orc file. We didn't move forward with this for some priority reasons.
Regarding your three questions, I will respond to you later after my on-call is finished because I need to dive into the source code. About your commit, I don't think it's the right solution because FieldReferenceExpression doesn't currently support nested field filter pushdown, maybe we need to extend it. You can also look further into reasonable solutions, which we'll discuss further later on. Best, Ron Venkatakrishnan Sowrirajan <vsowr...@asu.edu> 于2023年7月29日周六 03:31写道: > Hi all, > > Currently, I am working on adding support for nested fields filter push > down. In our use case running Flink on Batch, we found nested fields filter > push down is key - without it, it is significantly slow. Note: Spark SQL > supports nested fields filter push down. > > While debugging the code using IcebergTableSource as the table source, > narrowed down the issue to missing support for > RexNodeExtractor#RexNodeToExpressionConverter#visitFieldAccess. > As part of fixing it, I made changes by returning an > Option(FieldReferenceExpression) > with appropriate reference to the parent index and the child index for the > nested field with the data type info. > > But this new ResolvedExpression cannot be converted to RexNode which > happens in PushFilterIntoSourceScanRuleBase > < > https://github.com/apache/flink/blob/3f63e03e83144e9857834f8db1895637d2aa218a/flink-table/flink-table-planner/src/main/java/org/apache/flink/table/planner/plan/rules/logical/PushFilterIntoSourceScanRuleBase.java#L104 > > > . > > Few questions > > 1. Does FieldReferenceExpression support nested fields currently or should > it be extended to support nested fields? I couldn't figure this out from > the PushProjectIntoTableScanRule that supports nested column projection > push down. > 2. ExpressionConverter > < > https://github.com/apache/flink/blob/3f63e03e83144e9857834f8db1895637d2aa218a/flink-table/flink-table-planner/src/main/java/org/apache/flink/table/planner/expressions/converter/ExpressionConverter.java#L197 > > > converts ResolvedExpression -> RexNode but the new FieldReferenceExpression > with the nested field cannot be converted to RexNode. This is why the > answer to the 1st question is key. > 3. Anything else that I'm missing here? or is there an even easier way to > add support for nested fields filter push down? > > Partially working changes - Commit > < > https://github.com/venkata91/flink/commit/00cdf34ecf9be3ba669a97baaed4b69b85cd26f9 > > > Please > feel free to leave a comment directly in the commit. > > Any pointers here would be much appreciated! Thanks in advance. > > Disclaimer: Relatively new to Flink code base especially Table planner :-). > > Regards > Venkata krishnan >