Hi, Venkata krishnan Thanks for driving this work, look forward to your FLIP.
Best, Ron Venkatakrishnan Sowrirajan <vsowr...@asu.edu> 于2023年8月13日周日 14:34写道: > Thanks Yunhong. That's correct. I am able to make it work locally. > Currently, in the process of writing a FLIP for the necessary changes to > the SupportsFilterPushDown API to support nested fields filter push down. > > Regards > Venkata krishnan > > > On Mon, Aug 7, 2023 at 8:28 PM yh z <zhengyunhon...@gmail.com> wrote: > > > Hi Venkatakrishnan, > > Sorry for the late reply. I have looked at the code and feel like you > need > > to modify the logic of the > > ExpressionConverter.visit(FieldReferenceExpression expression) method to > > support nested types, > > which are not currently supported in currently code. > > > > Regards, > > Yunhong Zheng (Swuferhong) > > > > Venkatakrishnan Sowrirajan <vsowr...@asu.edu> 于2023年8月7日周一 13:30写道: > > > > > (Sorry, I pressed send too early) > > > > > > Thanks for the help @zhengyunhon...@gmail.com. > > > > > > Agree on not changing the API as much as possible as well as wrt > > > simplifying Projection pushdown with nested fields eventually as well. > > > > > > In terms of the code itself, currently I am trying to leverage the > > > FieldReferenceExpression to also handle nested fields for filter push > > down. > > > But where I am currently struggling to make progress is, once the > filters > > > are pushed to the table source itself, in > > > > PushFilterIntoSourceScanRuleBase#resolveFiltersAndCreateTableSourceTable > > > there is a conversion from List<ResolvedExpression (in this case > > > FieldReferenceExpression) to the List<RexNode> itself. > > > > > > If you have some pointers for that, please let me know. Thanks. > > > > > > Regards > > > Venkata krishnan > > > > > > > > > On Sun, Aug 6, 2023 at 10:23 PM Venkatakrishnan Sowrirajan < > > > vsowr...@asu.edu> > > > wrote: > > > > > > > Thanks @zhengyunhon...@gmail.com > > > > Regards > > > > Venkata krishnan > > > > > > > > > > > > On Sun, Aug 6, 2023 at 6:16 PM yh z <zhengyunhon...@gmail.com> > wrote: > > > > > > > >> Hi, Venkatakrishnan, > > > >> I think this is a very useful feature. I have been focusing on the > > > >> development of the flink-table-planner module recently, so if you > need > > > >> some > > > >> help, I can assist you in completing the development of some > sub-tasks > > > or > > > >> code review. > > > >> > > > >> Returning to the design itself, I think it's necessary to modify > > > >> FieldReferenceExpression or re-implement a > > > NestedFieldReferenceExpression. > > > >> As for modifying the interface of SupportsProjectionPushDown, I > think > > we > > > >> need to make some trade-offs. As a connector developer, the > stability > > of > > > >> the interface is very important. If there are no unresolved bugs, I > > > >> personally do not recommend modifying the interface. However, when I > > > first > > > >> read the code of SupportsProjectionPushDown, the design of int[][] > was > > > >> very > > > >> confusing for me, and it took me a long time to understand it by > > running > > > >> specify UT tests. Therefore, in terms of the design of this > interface > > > and > > > >> the consistency between different interfaces, there is indeed room > for > > > >> improvement it. > > > >> > > > >> Thanks, > > > >> Yunhong Zheng (Swuferhong) > > > >> > > > >> > > > >> > > > >> > > > >> Becket Qin <becket....@gmail.com> 于2023年8月3日周四 07:44写道: > > > >> > > > >> > Hi Jark, > > > >> > > > > >> > If the FieldReferenceExpression contains an int[] to support a > > nested > > > >> field > > > >> > reference, List<FieldReferenceExpression> (or > > > >> FieldReferenceExpression[]) > > > >> > and int[][] are actually equivalent. If we are designing this from > > > >> scratch, > > > >> > personally I prefer using List<FieldReferenceExpression> for > > > >> consistency, > > > >> > i.e. always resolving everything to expressions for users. > > Projection > > > >> is a > > > >> > simpler case, but should not be a special case. This avoids doing > > the > > > >> same > > > >> > thing in different ways which is also a confusion to the users. To > > me, > > > >> the > > > >> > int[][] format would become kind of a technical debt after we > extend > > > the > > > >> > FieldReferenceExpression. Although we don't have to address it > right > > > >> away > > > >> > in the same FLIP, this kind of debt accumulates over time and > makes > > > the > > > >> > project harder to learn and maintain. So, personally I prefer to > > > address > > > >> > these technical debts as soon as possible. > > > >> > > > > >> > Thanks, > > > >> > > > > >> > Jiangjie (Becket) Qin > > > >> > > > > >> > On Wed, Aug 2, 2023 at 8:19 PM Jark Wu <imj...@gmail.com> wrote: > > > >> > > > > >> > > Hi, > > > >> > > > > > >> > > I agree with Becket that we may need to extend > > > >> FieldReferenceExpression > > > >> > to > > > >> > > support nested field access (or maybe a new > > > >> > > NestedFieldReferenceExpression). > > > >> > > But I have some concerns about evolving the > > > >> > > SupportsProjectionPushDown.applyProjection. > > > >> > > A projection is much simpler than Filter Expression which only > > needs > > > >> to > > > >> > > represent the field indexes. > > > >> > > If we evolve `applyProjection` to accept > > > >> `List<FieldReferenceExpression> > > > >> > > projectedFields`, > > > >> > > users have to convert the `List<FieldReferenceExpression>` back > to > > > >> > int[][] > > > >> > > which is an overhead for users. > > > >> > > Field indexes (int[][]) is required to project schemas with the > > > >> > > utility org.apache.flink.table.connector.Projection. > > > >> > > > > > >> > > > > > >> > > Best, > > > >> > > Jark > > > >> > > > > > >> > > > > > >> > > > > > >> > > On Wed, 2 Aug 2023 at 07:40, Venkatakrishnan Sowrirajan < > > > >> > vsowr...@asu.edu> > > > >> > > wrote: > > > >> > > > > > >> > > > Thanks Becket for the suggestion. That makes sense. Let me try > > it > > > >> out > > > >> > and > > > >> > > > get back to you. > > > >> > > > > > > >> > > > Regards > > > >> > > > Venkata krishnan > > > >> > > > > > > >> > > > > > > >> > > > On Tue, Aug 1, 2023 at 9:04 AM Becket Qin < > becket....@gmail.com > > > > > > >> > wrote: > > > >> > > > > > > >> > > > > This is a very useful feature in practice. > > > >> > > > > > > > >> > > > > It looks to me that the key issue here is that Flink > > > >> > ResolvedExpression > > > >> > > > > does not have necessary abstraction for nested field access. > > So > > > >> the > > > >> > > > Calcite > > > >> > > > > RexFieldAccess does not have a counterpart in the > > > >> ResolvedExpression. > > > >> > > The > > > >> > > > > FieldReferenceExpression only supports direct access to the > > > >> fields, > > > >> > not > > > >> > > > > nested access. > > > >> > > > > > > > >> > > > > Theoretically speaking, this nested field reference is also > > > >> required > > > >> > by > > > >> > > > > projection pushdown. However, we addressed that by using an > > > >> int[][] > > > >> > in > > > >> > > > the > > > >> > > > > SupportsProjectionPushDown interface. Maybe we can do the > > > >> following: > > > >> > > > > > > > >> > > > > 1. Extend the FieldReferenceExpression to include an int[] > for > > > >> nested > > > >> > > > field > > > >> > > > > access, > > > >> > > > > 2. By doing (1), > > > >> > > > > > SupportsFilterPushDown#applyFilters(List<ResolvedExpression>) > > > can > > > >> > > support > > > >> > > > > nested field access. > > > >> > > > > 3. Evolve the > > SupportsProjectionPushDown.applyProjection(int[][] > > > >> > > > > projectedFields, DataType producedDataType) to > > > >> > > > > applyProjection(List<FieldReferenceExpression> > > projectedFields, > > > >> > > DataType > > > >> > > > > producedDataType) > > > >> > > > > > > > >> > > > > This will need a FLIP. > > > >> > > > > > > > >> > > > > Thanks, > > > >> > > > > > > > >> > > > > Jiangjie (Becket) Qin > > > >> > > > > > > > >> > > > > On Tue, Aug 1, 2023 at 11:42 PM Venkatakrishnan Sowrirajan < > > > >> > > > > vsowr...@asu.edu> > > > >> > > > > wrote: > > > >> > > > > > > > >> > > > > > Thanks for the response. Looking forward to your pointers. > > In > > > >> the > > > >> > > > > > meanwhile, let me figure out how we can implement it. Will > > > keep > > > >> you > > > >> > > > > posted. > > > >> > > > > > > > > >> > > > > > On Mon, Jul 31, 2023, 11:43 PM liu ron < > ron9....@gmail.com> > > > >> wrote: > > > >> > > > > > > > > >> > > > > > > Hi, Venkata > > > >> > > > > > > > > > >> > > > > > > Thanks for reporting this issue. Currently, Flink > doesn't > > > >> support > > > >> > > > > nested > > > >> > > > > > > filter pushdown. I also think that this optimization > would > > > be > > > >> > > useful, > > > >> > > > > > > especially for jobs, which may need to read a lot of > data > > > from > > > >> > the > > > >> > > > > > parquet > > > >> > > > > > > or orc file. We didn't move forward with this for some > > > >> priority > > > >> > > > > reasons. > > > >> > > > > > > > > > >> > > > > > > Regarding your three questions, I will respond to you > > later > > > >> after > > > >> > > my > > > >> > > > > > > on-call is finished because I need to dive into the > source > > > >> code. > > > >> > > > About > > > >> > > > > > your > > > >> > > > > > > commit, I don't think it's the right solution because > > > >> > > > > > > FieldReferenceExpression doesn't currently support > nested > > > >> field > > > >> > > > filter > > > >> > > > > > > pushdown, maybe we need to extend it. > > > >> > > > > > > > > > >> > > > > > > You can also look further into reasonable solutions, > which > > > >> we'll > > > >> > > > > discuss > > > >> > > > > > > further later on. > > > >> > > > > > > > > > >> > > > > > > Best, > > > >> > > > > > > Ron > > > >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > Venkatakrishnan Sowrirajan <vsowr...@asu.edu> > > 于2023年7月29日周六 > > > >> > > 03:31写道: > > > >> > > > > > > > > > >> > > > > > > > Hi all, > > > >> > > > > > > > > > > >> > > > > > > > Currently, I am working on adding support for nested > > > fields > > > >> > > filter > > > >> > > > > push > > > >> > > > > > > > down. In our use case running Flink on Batch, we found > > > >> nested > > > >> > > > fields > > > >> > > > > > > filter > > > >> > > > > > > > push down is key - without it, it is significantly > slow. > > > >> Note: > > > >> > > > Spark > > > >> > > > > > SQL > > > >> > > > > > > > supports nested fields filter push down. > > > >> > > > > > > > > > > >> > > > > > > > While debugging the code using IcebergTableSource as > the > > > >> table > > > >> > > > > source, > > > >> > > > > > > > narrowed down the issue to missing support for > > > >> > > > > > > > > > > >> RexNodeExtractor#RexNodeToExpressionConverter#visitFieldAccess. > > > >> > > > > > > > As part of fixing it, I made changes by returning an > > > >> > > > > > > > Option(FieldReferenceExpression) > > > >> > > > > > > > with appropriate reference to the parent index and the > > > child > > > >> > > index > > > >> > > > > for > > > >> > > > > > > the > > > >> > > > > > > > nested field with the data type info. > > > >> > > > > > > > > > > >> > > > > > > > But this new ResolvedExpression cannot be converted to > > > >> RexNode > > > >> > > > which > > > >> > > > > > > > happens in PushFilterIntoSourceScanRuleBase > > > >> > > > > > > > < > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > > > > https://urldefense.com/v3/__https://github.com/apache/flink/blob/3f63e03e83144e9857834f8db1895637d2aa218a/flink-table/flink-table-planner/src/main/java/org/apache/flink/table/planner/plan/rules/logical/PushFilterIntoSourceScanRuleBase.java*L104__;Iw!!IKRxdwAv5BmarQ!fNgxcul8ZGwkNE9ygOeVGlWlU6m_MLMXf4A3S3oQu9LBzYTPF90pZ7uXSGMr-5dFmzRn37-e9Q5cMnVs$ > > > >> > > > > > > > > > > > >> > > > > > > > . > > > >> > > > > > > > > > > >> > > > > > > > Few questions > > > >> > > > > > > > > > > >> > > > > > > > 1. Does FieldReferenceExpression support nested fields > > > >> > currently > > > >> > > or > > > >> > > > > > > should > > > >> > > > > > > > it be extended to support nested fields? I couldn't > > figure > > > >> this > > > >> > > out > > > >> > > > > > from > > > >> > > > > > > > the PushProjectIntoTableScanRule that supports nested > > > column > > > >> > > > > projection > > > >> > > > > > > > push down. > > > >> > > > > > > > 2. ExpressionConverter > > > >> > > > > > > > < > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > > > > https://urldefense.com/v3/__https://github.com/apache/flink/blob/3f63e03e83144e9857834f8db1895637d2aa218a/flink-table/flink-table-planner/src/main/java/org/apache/flink/table/planner/expressions/converter/ExpressionConverter.java*L197__;Iw!!IKRxdwAv5BmarQ!fNgxcul8ZGwkNE9ygOeVGlWlU6m_MLMXf4A3S3oQu9LBzYTPF90pZ7uXSGMr-5dFmzRn37-e9Z6jnkJm$ > > > >> > > > > > > > > > > > >> > > > > > > > converts ResolvedExpression -> RexNode but the new > > > >> > > > > > > FieldReferenceExpression > > > >> > > > > > > > with the nested field cannot be converted to RexNode. > > This > > > >> is > > > >> > why > > > >> > > > the > > > >> > > > > > > > answer to the 1st question is key. > > > >> > > > > > > > 3. Anything else that I'm missing here? or is there an > > > even > > > >> > > easier > > > >> > > > > way > > > >> > > > > > to > > > >> > > > > > > > add support for nested fields filter push down? > > > >> > > > > > > > > > > >> > > > > > > > Partially working changes - Commit > > > >> > > > > > > > < > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > > > > https://urldefense.com/v3/__https://github.com/venkata91/flink/commit/00cdf34ecf9be3ba669a97baaed4b69b85cd26f9__;!!IKRxdwAv5BmarQ!fNgxcul8ZGwkNE9ygOeVGlWlU6m_MLMXf4A3S3oQu9LBzYTPF90pZ7uXSGMr-5dFmzRn37-e9XeOjJ_a$ > > > >> > > > > > > > > > > > >> > > > > > > > Please > > > >> > > > > > > > feel free to leave a comment directly in the commit. > > > >> > > > > > > > > > > >> > > > > > > > Any pointers here would be much appreciated! Thanks in > > > >> advance. > > > >> > > > > > > > > > > >> > > > > > > > Disclaimer: Relatively new to Flink code base > especially > > > >> Table > > > >> > > > > planner > > > >> > > > > > > :-). > > > >> > > > > > > > > > > >> > > > > > > > Regards > > > >> > > > > > > > Venkata krishnan > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > > > > > > > >