Jark, How about having a separate NestedFieldReferenceExpression, and > abstracting a common base class "ReferenceExpression" for > NestedFieldReferenceExpression and FieldReferenceExpression? This makes > unifying expressions in > "SupportsProjectionPushdown#applyProjections(List<ReferenceExpression> > ...)" > possible.
This should be fine for *SupportsProjectionPushDown* and *SupportsFilterPushDown*. One concern in the case of *SupportsAggregatePushDown* with nested fields support (to be added in the future), with this proposal, the API will become backwards incompatible as the *args *for the aggregate function is *List<FieldReferenceExpression> *that needs to change to *List<ReferenceExpression>*. Regards Venkata krishnan On Thu, Aug 24, 2023 at 1:18 AM Jark Wu <imj...@gmail.com> wrote: > Hi Becket, > > I think it is the second case, that a FieldReferenceExpression is > constructed > by the framework and passed to the connector (interfaces listed by > Venkata[1] > and Catalog#listPartitionsByFilter). Besides, understanding the nested > field > is optional for users/connectors (just treat it as an unknown expression if > the > connector doesn't want to support it). > > If we extend FieldReferenceExpression, in the case of "where col.nested > > 10", > for the connectors already supported filter/delete pushdown, they may > wrongly > pushdown "col > 10" instead of "nested > 10" because they still treat > FieldReferenceExpression as a top-level column. This problem can be > resolved > by introducing an additional "supportedNestedPushdown" for each interface, > but that method is not elegant and is hard to remove in the future, and > this could > be avoided if we have a separate NestedFieldReferenceExpression. > > If we want to extend FieldReferenceExpression, we have to add protections > for every related API in one shot. Besides, FieldReferenceExpression is a > fundamental class in the planner, we have to go through all the code that > is using it to make sure it properly handling it if it is a nested field > which > is a big effort for the community. > > If we were designing this API on day 1, I fully support merging them in a > FieldReferenceExpression. But in this case, I'm thinking about how to > provide > users with a smooth migration path, and allow the community to gradually > put efforts into evolving the API, and not block the "Nested Fields Filter > Pushdown" > requirement. > > How about having a separate NestedFieldReferenceExpression, and > abstracting a common base class "ReferenceExpression" for > NestedFieldReferenceExpression and FieldReferenceExpression? This makes > unifying expressions in > "SupportsProjectionPushdown#applyProjections(List<ReferenceExpression> > ...)" > possible. > > Best, > Jark > > On Thu, 24 Aug 2023 at 07:00, Venkatakrishnan Sowrirajan <vsowr...@asu.edu > > > wrote: > > > Becket and Jark, > > > > Deprecate all the other > > > methods except tryApplyFilters() and tryApplyProjections(). > > > > For *SupportsProjectionPushDown*, we still need a > > *supportsNestedProjections* API on the table source as some of the table > > sources might not be able to handle nested fields and therefore the Flink > > planner should not push down the nested projections or else the > > *applyProjection > > *API has to be appropriately changed to return > > *unconvertibleProjections *similar > > to *SupportsFilterPushDown*. > > > > Or we have to introduce two different applyProjections() > > > methods for FieldReferenceExpression / NestedFieldReferenceExpression > > > respectively. > > > > Agree this is not preferred. Given that *supportNestedProjections *cannot > > be deprecated/removed based on the current API form, extending > > *FieldReferenceExpression* to support nested fields should be okay. > > > > Another alternative could be to change *applyProjections *to take > > List<ResolvedExpression> and on the connector side they choose to handle > > *FieldReferenceExpression* and *NestedFieldReferenceExpression *as > > applicable and return the remainingProjections. In the case of nested > field > > projections not supported, it should return them back but only projecting > > the top level fields. IMO, this is also *not preferred*. > > > > *SupportsAggregatePushDown* > > > > *AggregateExpression *currently takes in a list of > > *FieldReferenceExpression* as args for the aggregate function, if in > future > > *SupportsAggregatePushDown* adds support for aggregate pushdown on nested > > fields then the AggregateExpression API also has to change if a new > > NestedFieldReferenceExpression is introduced for nested fields. > > > > If we add a > > > flag for each new filter, > > > the interface will be filled with lots of flags (e.g., supportsBetween, > > > supportsIN) > > > > In an ideal situation, I completely agree with you. But in the current > > state, *supportsNestedFilters* can act as a bridge to reach the eventual > > desired state which is to have a clean and consistent set of APIs > > throughout all Supports*PushDown. > > > > Also shared some thoughts on the end state API > > < > > > https://urldefense.com/v3/__https://docs.google.com/document/d/1stLRPKOcxlEv8eHblkrOh0Zf5PLM-h76WMhEINHOyPY/edit?usp=sharing__;!!IKRxdwAv5BmarQ!ZZ2nS1PYlXLnEGFcikS3NsYG7tMaV3wU_z7FmvihNwQBmoLZk2WmcpuRWszK0FFmsInh9A6cndkJrQ$ > > > > > with extension to the *FieldReferenceExpression* to support nested > fields. > > Please take a look. > > > > Regards > > Venkata krishnan > > > > On Tue, Aug 22, 2023 at 5:02 PM Becket Qin <becket....@gmail.com> wrote: > > > > > Hi Jark, > > > > > > Regarding the migration path, it would be useful to scrutinize the use > > case > > > of FiledReferenceExpression and ResolvedExpressions. There are two > kinds > > of > > > use cases: > > > > > > 1. A ResolvedExpression is constructed by the user or connector / > plugin > > > developers. > > > 2. A ResolvedExpression is constructed by the framework and passed to > > user > > > or connector / plugin developers. > > > > > > For the first case, both of the approaches provide the same migration > > > experience. > > > > > > For the second case, generally speaking, introducing > > > NestedFieldReferenceExpression and extending FieldReferenceExpression > > would > > > have the same impact for backwards compatibility. > SupportsFilterPushDown > > is > > > a special case here because understanding the filter expressions is > > > optional for the source implementation. In other use cases, if > > > understanding the reference to a nested field is a must have, the user > > code > > > has to be changed, regardless of which approach we take to support > nested > > > fields. > > > > > > Therefore, I think we have to check each public API where the nested > > field > > > reference is exposed. If we have many public APIs where understanding > > > nested fields is optional for the user / plugin / connector > developers, > > > having a separate NestedFieldReferenceExpression would have a more > smooth > > > migration. Otherwise, there seems to be no difference between the two > > > approaches. > > > > > > Migration path aside, the main reason I prefer extending > > > FieldReferenceExpression over a new NestedFieldReferenceExpression is > > > because this makes the SupportsProjectionPushDown interface simpler. > > > Otherwise, we have to treat it as a special case that does not match > the > > > overall API style. Or we have to introduce two different > > applyProjections() > > > methods for FieldReferenceExpression / NestedFieldReferenceExpression > > > respectively. This issue further extends to implementation in addition > to > > > public API. A single FieldReferenceExpression might help simplify the > > > implementation code a little bit. For example, in a recursive > processing > > of > > > a row with nested rows, we may not need to switch between > > > FieldReferenceExpression and NestedFieldReferenceExpression depending > on > > > whether the record being processed is a top level record or nested > > record. > > > > > > Thanks, > > > > > > Jiangjie (Becket) Qin > > > > > > > > > On Tue, Aug 22, 2023 at 11:43 PM Jark Wu <imj...@gmail.com> wrote: > > > > > > > Hi Becket, > > > > > > > > I totally agree we should try to have a consistent API for a final > > state. > > > > The only concern I have mentioned is the "smooth" migration path. > > > > The FiledReferenceExpression is widely used in many public APIs, > > > > not only in the SupportsFilterPushDown. Yes, we can change every > > > > methods in 2-steps, but is it good to change API back and forth for > > this? > > > > Personally, I'm fine with a separate NestedFieldReferenceExpression > > > class. > > > > TBH, I prefer the separated way because it makes the reference > > expression > > > > more clear and concise. > > > > > > > > Best, > > > > Jark > > > > > > > > > > > > On Tue, 22 Aug 2023 at 16:53, Becket Qin <becket....@gmail.com> > wrote: > > > > > > > > > Thanks for the reply, Jark. > > > > > > > > > > I think it will be helpful to understand the final state we want to > > > > > eventually achieve first, then we can discuss the steps towards > that > > > > final > > > > > state. > > > > > > > > > > It looks like there are two proposed end states now: > > > > > > > > > > 1. Have a separate NestedFieldReferenceExpression class; keep > > > > > SupportsFilterPushDown and SupportsProjectionPushDown the same. It > is > > > > just > > > > > a one step change. > > > > > - Regarding the supportsNestedFilterPushDown() method, if our > > > contract > > > > > with the connector developer today is "The implementation should > > ignore > > > > > unrecognized expressions by putting them into the remaining > filters, > > > > > instead of throwing exceptions". Then there is no need for this > > > method. I > > > > > am not sure about the current contract. We should probably make it > > > clear > > > > in > > > > > the interface Java doc. > > > > > > > > > > 2. Extend the existing FiledReferenceExpression class to support > > nested > > > > > fields; SupportsFilterPushDown only has one method of > > > > > applyFilters(List<ResolvedExpression>); SupportsProjectionPushDown > > only > > > > has > > > > > one method of applyProjections(List<FieldReferenceExpression>, > > > DataType). > > > > > It could just be two steps if we are not too obsessed with the > exact > > > > names > > > > > of "applyFilters" and "applyProjections". More specifically, it > takes > > > two > > > > > steps to achieve this final state: > > > > > a. introduce a new method > > tryApplyFilters(List<ResolvedExpression>) > > > > to > > > > > SupportsFilterPushDown, which may have FiledReferenceExpression > with > > > > nested > > > > > fields. The default implementation throws an exception. The runtime > > > will > > > > > first call tryApplyFilters() with nested fields. In case of > > exception, > > > it > > > > > calls the existing applyFilters() without including the nested > > filters. > > > > > Similarly, in SupportsProjectionPushDown, introduce a > > > > > tryApplyProjections<List<NestedFieldReference> method returning a > > > Result. > > > > > The Result also contains the accepted and unapplicable projections. > > The > > > > > default implementation also throws an exception. Deprecate all the > > > other > > > > > methods except tryApplyFilters() and tryApplyProjections(). > > > > > b. remove the deprecated methods in the next major version > bump. > > > > > > > > > > Now the question is putting the migration steps aside, which end > > state > > > do > > > > > we prefer? While the first end state is acceptable for me, > > personally, > > > I > > > > > prefer the latter if we are designing from scratch. It is clean, > > > > consistent > > > > > and intuitive. Given the size of Flink, keeping APIs in the same > > style > > > > over > > > > > time is important. The migration is also not that complicated. > > > > > > > > > > Thanks, > > > > > > > > > > Jiangjie (Becket) Qin > > > > > > > > > > > > > > > On Tue, Aug 22, 2023 at 2:23 PM Jark Wu <imj...@gmail.com> wrote: > > > > > > > > > > > Hi Venkat, > > > > > > > > > > > > Thanks for the proposal. > > > > > > > > > > > > I have some minor comments about the FLIP. > > > > > > > > > > > > 1. I think we don't need to > > > > > > add SupportsFilterPushDown#supportsNestedFilters() method, > > > > > > because connectors can skip nested filters by putting them in > > > > > > Result#remainingFilters(). > > > > > > And this is backward-compatible because unknown expressions were > > > added > > > > to > > > > > > the remaining filters. > > > > > > Planner should push predicate expressions as more as possible. If > > we > > > > add > > > > > a > > > > > > flag for each new filter, > > > > > > the interface will be filled with lots of flags (e.g., > > > supportsBetween, > > > > > > supportsIN). > > > > > > > > > > > > 2. NestedFieldReferenceExpression#nestedFieldName should be an > > array > > > of > > > > > > field names? > > > > > > Each string represents a field name part of the field path. Just > > keep > > > > > > aligning with `nestedFieldIndexArray`. > > > > > > > > > > > > 3. My concern about making FieldReferenceExpression support > nested > > > > fields > > > > > > is the compatibility. > > > > > > It is a public API and users/connectors are already using it. > > People > > > > > > assumed it is a top-level column > > > > > > reference, and applied logic on it. But that's not true now and > > this > > > > may > > > > > > lead to unexpected errors. > > > > > > Having a separate NestedFieldReferenceExpression sounds safer to > > me. > > > > > Mixing > > > > > > them in a class may > > > > > > confuse users what's the meaning of getFieldName() and > > > > getFieldIndex(). > > > > > > > > > > > > > > > > > > Regarding using NestedFieldReferenceExpression in > > > > > > SupportsProjectionPushDown, do you > > > > > > have any concerns @Timo Walther <twal...@apache.org> ? > > > > > > > > > > > > Best, > > > > > > Jark > > > > > > > > > > > > > > > > > > > > > > > > On Tue, 22 Aug 2023 at 05:55, Venkatakrishnan Sowrirajan < > > > > > vsowr...@asu.edu > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > Sounds like a great suggestion, Becket. +1. Agree with cleaning > > up > > > > the > > > > > > APIs > > > > > > > and making it consistent in all the pushdown APIs. > > > > > > > > > > > > > > Your suggested approach seems fine to me, unless anyone else > has > > > any > > > > > > other > > > > > > > concerns. Just have couple of clarifying questions: > > > > > > > > > > > > > > 1. Do you think we should standardize the APIs across all the > > > > pushdown > > > > > > > supports like SupportsPartitionPushdown, > SupportsDynamicFiltering > > > etc > > > > > in > > > > > > > the end state? > > > > > > > > > > > > > > The current proposal works if we do not want to migrate > > > > > > > > SupportsFilterPushdown to also use > > NestedFieldReferenceExpression > > > > in > > > > > > the > > > > > > > > long term. > > > > > > > > > > > > > > > Did you mean *FieldReferenceExpression* instead of > > > > > > > *NestedFieldReferenceExpression*? > > > > > > > > > > > > > > 2. Extend the FieldReferenceExpression to support nested > fields. > > > > > > > > - Change the index field type from int to int[]. > > > > > > > > > > > > > > - Add a new method int[] getFieldIndexArray(). > > > > > > > > - Deprecate the int getFieldIndex() method, the code will > > be > > > > > > removed > > > > > > > in > > > > > > > > the next major version bump. > > > > > > > > > > > > > > I assume getFieldIndex would return fieldIndexArray[0], right? > > > > > > > > > > > > > > Thanks > > > > > > > Venkat > > > > > > > > > > > > > > On Fri, Aug 18, 2023 at 4:47 PM Becket Qin < > becket....@gmail.com > > > > > > > > wrote: > > > > > > > > > > > > > > > Thanks for the proposal, Venkata. > > > > > > > > > > > > > > > > The current proposal works if we do not want to migrate > > > > > > > > SupportsFilterPushdown to also use > > NestedFieldReferenceExpression > > > > in > > > > > > the > > > > > > > > long term. > > > > > > > > > > > > > > > Did you mean *FieldReferenceExpression* instead of > > > > > > > *NestedFieldReferenceExpression*? > > > > > > > > > > > > > > > > > > > > > > > Otherwise, the alternative solution briefly mentioned in the > > > > rejected > > > > > > > > alternatives would be the following: > > > > > > > > Phase 1: > > > > > > > > 1. Introduce a supportsNestedFilters() method to the > > > > > > > SupportsFilterPushdown > > > > > > > > interface. (same as current proposal). > > > > > > > > 2. Extend the FieldReferenceExpression to support nested > > fields. > > > > > > > > - Change the index field type from int to int[]. > > > > > > > > > > > > > > - Add a new method int[] getFieldIndexArray(). > > > > > > > > - Deprecate the int getFieldIndex() method, the code will > > be > > > > > > removed > > > > > > > in > > > > > > > > the next major version bump. > > > > > > > > > > > > > > > > > > > > > > > > > > > > 3. In the SupportsProjectionPushDown interface > > > > > > > > - add a new method > > > > > applyProjection(List<FieldReferenceExpression>, > > > > > > > > DataType), with default implementation invoking > > > > > > applyProjection(int[][], > > > > > > > > DataType) > > > > > > > > - deprecate the current applyProjection(int[][], > DataType) > > > > method > > > > > > > > > > > > > > > > Phase 2 (in the next major version bump) > > > > > > > > 1. remove the deprecated methods. > > > > > > > > > > > > > > > > Phase 3 (optional) > > > > > > > > 1. deprecate and remove the supportsNestedFilters() / > > > > > > > > supportsNestedProjection() methods from the > > > SupportsFilterPushDown > > > > / > > > > > > > > SupportsProjectionPushDown interfaces. > > > > > > > > > > > > > > > > Personally I prefer this alternative. It takes longer to > finish > > > the > > > > > > work, > > > > > > > > but the API eventually becomes clean and consistent. But I > can > > > live > > > > > > with > > > > > > > > the current proposal. > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > Jiangjie (Becket) Qin > > > > > > > > > > > > > > > > On Sat, Aug 19, 2023 at 12:09 AM Venkatakrishnan Sowrirajan < > > > > > > > > vsowr...@asu.edu> wrote: > > > > > > > > > > > > > > > > > Gentle ping for reviews/feedback. > > > > > > > > > > > > > > > > > > On Tue, Aug 15, 2023, 5:37 PM Venkatakrishnan Sowrirajan < > > > > > > > > vsowr...@asu.edu > > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > Hi All, > > > > > > > > > > > > > > > > > > > > I am opening this thread to discuss FLIP-356: Support > > Nested > > > > > Fields > > > > > > > > > > Filter Pushdown. The FLIP can be found at > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://urldefense.com/v3/__https://cwiki.apache.org/confluence/display/FLINK/FLIP-356*3A*Support*Nested*Fields*Filter*Pushdown__;JSsrKysr!!IKRxdwAv5BmarQ!clxXJwshKpn559SAkQiieqgGe0ZduXCzUKCmYLtFIbQLmrmEEgdmuEIM8ZM1M3O_uGqOploU4ailqGpukAg$ > > > > > > > > > > > > > > > > > > > > This FLIP adds support for pushing down nested fields > > filters > > > > to > > > > > > the > > > > > > > > > > underlying TableSource. In our data lake, we find a lot > of > > > > > datasets > > > > > > > > have > > > > > > > > > > nested fields and also user queries with filters defined > on > > > the > > > > > > > nested > > > > > > > > > > fields. This would drastically improve the performance > for > > > > those > > > > > > sets > > > > > > > > of > > > > > > > > > > queries. > > > > > > > > > > > > > > > > > > > > Appreciate any comments or feedback you may have on this > > > > > proposal. > > > > > > > > > > > > > > > > > > > > Regards > > > > > > > > > > Venkata krishnan > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >