Hi Venkat, Thanks for the proposal.
I have some minor comments about the FLIP. 1. I think we don't need to add SupportsFilterPushDown#supportsNestedFilters() method, because connectors can skip nested filters by putting them in Result#remainingFilters(). And this is backward-compatible because unknown expressions were added to the remaining filters. Planner should push predicate expressions as more as possible. If we add a flag for each new filter, the interface will be filled with lots of flags (e.g., supportsBetween, supportsIN). 2. NestedFieldReferenceExpression#nestedFieldName should be an array of field names? Each string represents a field name part of the field path. Just keep aligning with `nestedFieldIndexArray`. 3. My concern about making FieldReferenceExpression support nested fields is the compatibility. It is a public API and users/connectors are already using it. People assumed it is a top-level column reference, and applied logic on it. But that's not true now and this may lead to unexpected errors. Having a separate NestedFieldReferenceExpression sounds safer to me. Mixing them in a class may confuse users what's the meaning of getFieldName() and getFieldIndex(). Regarding using NestedFieldReferenceExpression in SupportsProjectionPushDown, do you have any concerns @Timo Walther <twal...@apache.org> ? Best, Jark On Tue, 22 Aug 2023 at 05:55, Venkatakrishnan Sowrirajan <vsowr...@asu.edu> wrote: > Sounds like a great suggestion, Becket. +1. Agree with cleaning up the APIs > and making it consistent in all the pushdown APIs. > > Your suggested approach seems fine to me, unless anyone else has any other > concerns. Just have couple of clarifying questions: > > 1. Do you think we should standardize the APIs across all the pushdown > supports like SupportsPartitionPushdown, SupportsDynamicFiltering etc in > the end state? > > The current proposal works if we do not want to migrate > > SupportsFilterPushdown to also use NestedFieldReferenceExpression in the > > long term. > > > Did you mean *FieldReferenceExpression* instead of > *NestedFieldReferenceExpression*? > > 2. Extend the FieldReferenceExpression to support nested fields. > > - Change the index field type from int to int[]. > > - Add a new method int[] getFieldIndexArray(). > > - Deprecate the int getFieldIndex() method, the code will be removed > in > > the next major version bump. > > I assume getFieldIndex would return fieldIndexArray[0], right? > > Thanks > Venkat > > On Fri, Aug 18, 2023 at 4:47 PM Becket Qin <becket....@gmail.com> wrote: > > > Thanks for the proposal, Venkata. > > > > The current proposal works if we do not want to migrate > > SupportsFilterPushdown to also use NestedFieldReferenceExpression in the > > long term. > > > Did you mean *FieldReferenceExpression* instead of > *NestedFieldReferenceExpression*? > > > > > Otherwise, the alternative solution briefly mentioned in the rejected > > alternatives would be the following: > > Phase 1: > > 1. Introduce a supportsNestedFilters() method to the > SupportsFilterPushdown > > interface. (same as current proposal). > > 2. Extend the FieldReferenceExpression to support nested fields. > > - Change the index field type from int to int[]. > > - Add a new method int[] getFieldIndexArray(). > > - Deprecate the int getFieldIndex() method, the code will be removed > in > > the next major version bump. > > > > 3. In the SupportsProjectionPushDown interface > > - add a new method applyProjection(List<FieldReferenceExpression>, > > DataType), with default implementation invoking applyProjection(int[][], > > DataType) > > - deprecate the current applyProjection(int[][], DataType) method > > > > Phase 2 (in the next major version bump) > > 1. remove the deprecated methods. > > > > Phase 3 (optional) > > 1. deprecate and remove the supportsNestedFilters() / > > supportsNestedProjection() methods from the SupportsFilterPushDown / > > SupportsProjectionPushDown interfaces. > > > > Personally I prefer this alternative. It takes longer to finish the work, > > but the API eventually becomes clean and consistent. But I can live with > > the current proposal. > > > > Thanks, > > > > Jiangjie (Becket) Qin > > > > On Sat, Aug 19, 2023 at 12:09 AM Venkatakrishnan Sowrirajan < > > vsowr...@asu.edu> wrote: > > > > > Gentle ping for reviews/feedback. > > > > > > On Tue, Aug 15, 2023, 5:37 PM Venkatakrishnan Sowrirajan < > > vsowr...@asu.edu > > > > > > > wrote: > > > > > > > Hi All, > > > > > > > > I am opening this thread to discuss FLIP-356: Support Nested Fields > > > > Filter Pushdown. The FLIP can be found at > > > > > > > > > > https://urldefense.com/v3/__https://cwiki.apache.org/confluence/display/FLINK/FLIP-356*3A*Support*Nested*Fields*Filter*Pushdown__;JSsrKysr!!IKRxdwAv5BmarQ!clxXJwshKpn559SAkQiieqgGe0ZduXCzUKCmYLtFIbQLmrmEEgdmuEIM8ZM1M3O_uGqOploU4ailqGpukAg$ > > > > > > > > This FLIP adds support for pushing down nested fields filters to the > > > > underlying TableSource. In our data lake, we find a lot of datasets > > have > > > > nested fields and also user queries with filters defined on the > nested > > > > fields. This would drastically improve the performance for those sets > > of > > > > queries. > > > > > > > > Appreciate any comments or feedback you may have on this proposal. > > > > > > > > Regards > > > > Venkata krishnan > > > > > > > > > >