Hi, Venkata

Thanks for reporting this issue. Currently, Flink doesn't support nested
filter pushdown. I also think that this optimization would be useful,
especially for jobs, which may need to read a lot of data from the parquet
or orc file. We didn't move forward with this for some priority reasons.

Regarding your three questions, I will respond to you later after my
on-call is finished because I need to dive into the source code. About your
commit, I don't think it's the right solution because
FieldReferenceExpression doesn't currently support nested field filter
pushdown, maybe we need to extend it.

You can also look further into reasonable solutions, which we'll discuss
further later on.

Best,
Ron


Venkatakrishnan Sowrirajan <vsowr...@asu.edu> 于2023年7月29日周六 03:31写道:

> Hi all,
>
> Currently, I am working on adding support for nested fields filter push
> down. In our use case running Flink on Batch, we found nested fields filter
> push down is key - without it, it is significantly slow. Note: Spark SQL
> supports nested fields filter push down.
>
> While debugging the code using IcebergTableSource as the table source,
> narrowed down the issue to missing support for
> RexNodeExtractor#RexNodeToExpressionConverter#visitFieldAccess.
> As part of fixing it, I made changes by returning an
> Option(FieldReferenceExpression)
> with appropriate reference to the parent index and the child index for the
> nested field with the data type info.
>
> But this new ResolvedExpression cannot be converted to RexNode which
> happens in PushFilterIntoSourceScanRuleBase
> <
> https://github.com/apache/flink/blob/3f63e03e83144e9857834f8db1895637d2aa218a/flink-table/flink-table-planner/src/main/java/org/apache/flink/table/planner/plan/rules/logical/PushFilterIntoSourceScanRuleBase.java#L104
> >
> .
>
> Few questions
>
> 1. Does FieldReferenceExpression support nested fields currently or should
> it be extended to support nested fields? I couldn't figure this out from
> the PushProjectIntoTableScanRule that supports nested column projection
> push down.
> 2. ExpressionConverter
> <
> https://github.com/apache/flink/blob/3f63e03e83144e9857834f8db1895637d2aa218a/flink-table/flink-table-planner/src/main/java/org/apache/flink/table/planner/expressions/converter/ExpressionConverter.java#L197
> >
> converts ResolvedExpression -> RexNode but the new FieldReferenceExpression
> with the nested field cannot be converted to RexNode. This is why the
> answer to the 1st question is key.
> 3. Anything else that I'm missing here? or is there an even easier way to
> add support for nested fields filter push down?
>
> Partially working changes - Commit
> <
> https://github.com/venkata91/flink/commit/00cdf34ecf9be3ba669a97baaed4b69b85cd26f9
> >
> Please
> feel free to leave a comment directly in the commit.
>
> Any pointers here would be much appreciated! Thanks in advance.
>
> Disclaimer: Relatively new to Flink code base especially Table planner :-).
>
> Regards
> Venkata krishnan
>

Reply via email to