To keep it backwards compatible, introduce another API *applyAggregates *with *List<ReferenceExpression> *when nested field support is added and deprecate the current API. This will by default throw an exception. In flink planner, *applyAggregates *with nested fields and if it throws exception then *applyAggregates* without nested fields.
Regards Venkata krishnan On Thu, Aug 24, 2023 at 10:13 AM Venkatakrishnan Sowrirajan < vsowr...@asu.edu> wrote: > Jark, > > How about having a separate NestedFieldReferenceExpression, and >> abstracting a common base class "ReferenceExpression" for >> NestedFieldReferenceExpression and FieldReferenceExpression? This makes >> unifying expressions in >> "SupportsProjectionPushdown#applyProjections(List<ReferenceExpression> >> ...)" >> possible. > > This should be fine for *SupportsProjectionPushDown* and > *SupportsFilterPushDown*. One concern in the case of > *SupportsAggregatePushDown* with nested fields support (to be added in > the future), with this proposal, the API will become backwards incompatible > as the *args *for the aggregate function is *List<FieldReferenceExpression> > *that needs to change to *List<ReferenceExpression>*. > > Regards > Venkata krishnan > > > On Thu, Aug 24, 2023 at 1:18 AM Jark Wu <imj...@gmail.com> wrote: > >> Hi Becket, >> >> I think it is the second case, that a FieldReferenceExpression is >> constructed >> by the framework and passed to the connector (interfaces listed by >> Venkata[1] >> and Catalog#listPartitionsByFilter). Besides, understanding the nested >> field >> is optional for users/connectors (just treat it as an unknown expression >> if >> the >> connector doesn't want to support it). >> >> If we extend FieldReferenceExpression, in the case of "where col.nested > >> 10", >> for the connectors already supported filter/delete pushdown, they may >> wrongly >> pushdown "col > 10" instead of "nested > 10" because they still treat >> FieldReferenceExpression as a top-level column. This problem can be >> resolved >> by introducing an additional "supportedNestedPushdown" for each interface, >> but that method is not elegant and is hard to remove in the future, and >> this could >> be avoided if we have a separate NestedFieldReferenceExpression. >> >> If we want to extend FieldReferenceExpression, we have to add protections >> for every related API in one shot. Besides, FieldReferenceExpression is a >> fundamental class in the planner, we have to go through all the code that >> is using it to make sure it properly handling it if it is a nested field >> which >> is a big effort for the community. >> >> If we were designing this API on day 1, I fully support merging them in a >> FieldReferenceExpression. But in this case, I'm thinking about how to >> provide >> users with a smooth migration path, and allow the community to gradually >> put efforts into evolving the API, and not block the "Nested Fields Filter >> Pushdown" >> requirement. >> >> How about having a separate NestedFieldReferenceExpression, and >> abstracting a common base class "ReferenceExpression" for >> NestedFieldReferenceExpression and FieldReferenceExpression? This makes >> unifying expressions in >> "SupportsProjectionPushdown#applyProjections(List<ReferenceExpression> >> ...)" >> possible. >> >> Best, >> Jark >> >> On Thu, 24 Aug 2023 at 07:00, Venkatakrishnan Sowrirajan < >> vsowr...@asu.edu> >> wrote: >> >> > Becket and Jark, >> > >> > Deprecate all the other >> > > methods except tryApplyFilters() and tryApplyProjections(). >> > >> > For *SupportsProjectionPushDown*, we still need a >> > *supportsNestedProjections* API on the table source as some of the table >> > sources might not be able to handle nested fields and therefore the >> Flink >> > planner should not push down the nested projections or else the >> > *applyProjection >> > *API has to be appropriately changed to return >> > *unconvertibleProjections *similar >> > to *SupportsFilterPushDown*. >> > >> > Or we have to introduce two different applyProjections() >> > > methods for FieldReferenceExpression / NestedFieldReferenceExpression >> > > respectively. >> > >> > Agree this is not preferred. Given that *supportNestedProjections >> *cannot >> > be deprecated/removed based on the current API form, extending >> > *FieldReferenceExpression* to support nested fields should be okay. >> > >> > Another alternative could be to change *applyProjections *to take >> > List<ResolvedExpression> and on the connector side they choose to handle >> > *FieldReferenceExpression* and *NestedFieldReferenceExpression *as >> > applicable and return the remainingProjections. In the case of nested >> field >> > projections not supported, it should return them back but only >> projecting >> > the top level fields. IMO, this is also *not preferred*. >> > >> > *SupportsAggregatePushDown* >> > >> > *AggregateExpression *currently takes in a list of >> > *FieldReferenceExpression* as args for the aggregate function, if in >> future >> > *SupportsAggregatePushDown* adds support for aggregate pushdown on >> nested >> > fields then the AggregateExpression API also has to change if a new >> > NestedFieldReferenceExpression is introduced for nested fields. >> > >> > If we add a >> > > flag for each new filter, >> > > the interface will be filled with lots of flags (e.g., >> supportsBetween, >> > > supportsIN) >> > >> > In an ideal situation, I completely agree with you. But in the current >> > state, *supportsNestedFilters* can act as a bridge to reach the eventual >> > desired state which is to have a clean and consistent set of APIs >> > throughout all Supports*PushDown. >> > >> > Also shared some thoughts on the end state API >> > < >> > >> https://urldefense.com/v3/__https://docs.google.com/document/d/1stLRPKOcxlEv8eHblkrOh0Zf5PLM-h76WMhEINHOyPY/edit?usp=sharing__;!!IKRxdwAv5BmarQ!ZZ2nS1PYlXLnEGFcikS3NsYG7tMaV3wU_z7FmvihNwQBmoLZk2WmcpuRWszK0FFmsInh9A6cndkJrQ$ >> > > >> > with extension to the *FieldReferenceExpression* to support nested >> fields. >> > Please take a look. >> > >> > Regards >> > Venkata krishnan >> > >> > On Tue, Aug 22, 2023 at 5:02 PM Becket Qin <becket....@gmail.com> >> wrote: >> > >> > > Hi Jark, >> > > >> > > Regarding the migration path, it would be useful to scrutinize the use >> > case >> > > of FiledReferenceExpression and ResolvedExpressions. There are two >> kinds >> > of >> > > use cases: >> > > >> > > 1. A ResolvedExpression is constructed by the user or connector / >> plugin >> > > developers. >> > > 2. A ResolvedExpression is constructed by the framework and passed to >> > user >> > > or connector / plugin developers. >> > > >> > > For the first case, both of the approaches provide the same migration >> > > experience. >> > > >> > > For the second case, generally speaking, introducing >> > > NestedFieldReferenceExpression and extending FieldReferenceExpression >> > would >> > > have the same impact for backwards compatibility. >> SupportsFilterPushDown >> > is >> > > a special case here because understanding the filter expressions is >> > > optional for the source implementation. In other use cases, if >> > > understanding the reference to a nested field is a must have, the user >> > code >> > > has to be changed, regardless of which approach we take to support >> nested >> > > fields. >> > > >> > > Therefore, I think we have to check each public API where the nested >> > field >> > > reference is exposed. If we have many public APIs where understanding >> > > nested fields is optional for the user / plugin / connector >> developers, >> > > having a separate NestedFieldReferenceExpression would have a more >> smooth >> > > migration. Otherwise, there seems to be no difference between the two >> > > approaches. >> > > >> > > Migration path aside, the main reason I prefer extending >> > > FieldReferenceExpression over a new NestedFieldReferenceExpression is >> > > because this makes the SupportsProjectionPushDown interface simpler. >> > > Otherwise, we have to treat it as a special case that does not match >> the >> > > overall API style. Or we have to introduce two different >> > applyProjections() >> > > methods for FieldReferenceExpression / NestedFieldReferenceExpression >> > > respectively. This issue further extends to implementation in >> addition to >> > > public API. A single FieldReferenceExpression might help simplify the >> > > implementation code a little bit. For example, in a recursive >> processing >> > of >> > > a row with nested rows, we may not need to switch between >> > > FieldReferenceExpression and NestedFieldReferenceExpression depending >> on >> > > whether the record being processed is a top level record or nested >> > record. >> > > >> > > Thanks, >> > > >> > > Jiangjie (Becket) Qin >> > > >> > > >> > > On Tue, Aug 22, 2023 at 11:43 PM Jark Wu <imj...@gmail.com> wrote: >> > > >> > > > Hi Becket, >> > > > >> > > > I totally agree we should try to have a consistent API for a final >> > state. >> > > > The only concern I have mentioned is the "smooth" migration path. >> > > > The FiledReferenceExpression is widely used in many public APIs, >> > > > not only in the SupportsFilterPushDown. Yes, we can change every >> > > > methods in 2-steps, but is it good to change API back and forth for >> > this? >> > > > Personally, I'm fine with a separate NestedFieldReferenceExpression >> > > class. >> > > > TBH, I prefer the separated way because it makes the reference >> > expression >> > > > more clear and concise. >> > > > >> > > > Best, >> > > > Jark >> > > > >> > > > >> > > > On Tue, 22 Aug 2023 at 16:53, Becket Qin <becket....@gmail.com> >> wrote: >> > > > >> > > > > Thanks for the reply, Jark. >> > > > > >> > > > > I think it will be helpful to understand the final state we want >> to >> > > > > eventually achieve first, then we can discuss the steps towards >> that >> > > > final >> > > > > state. >> > > > > >> > > > > It looks like there are two proposed end states now: >> > > > > >> > > > > 1. Have a separate NestedFieldReferenceExpression class; keep >> > > > > SupportsFilterPushDown and SupportsProjectionPushDown the same. >> It is >> > > > just >> > > > > a one step change. >> > > > > - Regarding the supportsNestedFilterPushDown() method, if our >> > > contract >> > > > > with the connector developer today is "The implementation should >> > ignore >> > > > > unrecognized expressions by putting them into the remaining >> filters, >> > > > > instead of throwing exceptions". Then there is no need for this >> > > method. I >> > > > > am not sure about the current contract. We should probably make it >> > > clear >> > > > in >> > > > > the interface Java doc. >> > > > > >> > > > > 2. Extend the existing FiledReferenceExpression class to support >> > nested >> > > > > fields; SupportsFilterPushDown only has one method of >> > > > > applyFilters(List<ResolvedExpression>); SupportsProjectionPushDown >> > only >> > > > has >> > > > > one method of applyProjections(List<FieldReferenceExpression>, >> > > DataType). >> > > > > It could just be two steps if we are not too obsessed with the >> exact >> > > > names >> > > > > of "applyFilters" and "applyProjections". More specifically, it >> takes >> > > two >> > > > > steps to achieve this final state: >> > > > > a. introduce a new method >> > tryApplyFilters(List<ResolvedExpression>) >> > > > to >> > > > > SupportsFilterPushDown, which may have FiledReferenceExpression >> with >> > > > nested >> > > > > fields. The default implementation throws an exception. The >> runtime >> > > will >> > > > > first call tryApplyFilters() with nested fields. In case of >> > exception, >> > > it >> > > > > calls the existing applyFilters() without including the nested >> > filters. >> > > > > Similarly, in SupportsProjectionPushDown, introduce a >> > > > > tryApplyProjections<List<NestedFieldReference> method returning a >> > > Result. >> > > > > The Result also contains the accepted and unapplicable >> projections. >> > The >> > > > > default implementation also throws an exception. Deprecate all the >> > > other >> > > > > methods except tryApplyFilters() and tryApplyProjections(). >> > > > > b. remove the deprecated methods in the next major version >> bump. >> > > > > >> > > > > Now the question is putting the migration steps aside, which end >> > state >> > > do >> > > > > we prefer? While the first end state is acceptable for me, >> > personally, >> > > I >> > > > > prefer the latter if we are designing from scratch. It is clean, >> > > > consistent >> > > > > and intuitive. Given the size of Flink, keeping APIs in the same >> > style >> > > > over >> > > > > time is important. The migration is also not that complicated. >> > > > > >> > > > > Thanks, >> > > > > >> > > > > Jiangjie (Becket) Qin >> > > > > >> > > > > >> > > > > On Tue, Aug 22, 2023 at 2:23 PM Jark Wu <imj...@gmail.com> wrote: >> > > > > >> > > > > > Hi Venkat, >> > > > > > >> > > > > > Thanks for the proposal. >> > > > > > >> > > > > > I have some minor comments about the FLIP. >> > > > > > >> > > > > > 1. I think we don't need to >> > > > > > add SupportsFilterPushDown#supportsNestedFilters() method, >> > > > > > because connectors can skip nested filters by putting them in >> > > > > > Result#remainingFilters(). >> > > > > > And this is backward-compatible because unknown expressions were >> > > added >> > > > to >> > > > > > the remaining filters. >> > > > > > Planner should push predicate expressions as more as possible. >> If >> > we >> > > > add >> > > > > a >> > > > > > flag for each new filter, >> > > > > > the interface will be filled with lots of flags (e.g., >> > > supportsBetween, >> > > > > > supportsIN). >> > > > > > >> > > > > > 2. NestedFieldReferenceExpression#nestedFieldName should be an >> > array >> > > of >> > > > > > field names? >> > > > > > Each string represents a field name part of the field path. Just >> > keep >> > > > > > aligning with `nestedFieldIndexArray`. >> > > > > > >> > > > > > 3. My concern about making FieldReferenceExpression support >> nested >> > > > fields >> > > > > > is the compatibility. >> > > > > > It is a public API and users/connectors are already using it. >> > People >> > > > > > assumed it is a top-level column >> > > > > > reference, and applied logic on it. But that's not true now and >> > this >> > > > may >> > > > > > lead to unexpected errors. >> > > > > > Having a separate NestedFieldReferenceExpression sounds safer to >> > me. >> > > > > Mixing >> > > > > > them in a class may >> > > > > > confuse users what's the meaning of getFieldName() and >> > > > getFieldIndex(). >> > > > > > >> > > > > > >> > > > > > Regarding using NestedFieldReferenceExpression in >> > > > > > SupportsProjectionPushDown, do you >> > > > > > have any concerns @Timo Walther <twal...@apache.org> ? >> > > > > > >> > > > > > Best, >> > > > > > Jark >> > > > > > >> > > > > > >> > > > > > >> > > > > > On Tue, 22 Aug 2023 at 05:55, Venkatakrishnan Sowrirajan < >> > > > > vsowr...@asu.edu >> > > > > > > >> > > > > > wrote: >> > > > > > >> > > > > > > Sounds like a great suggestion, Becket. +1. Agree with >> cleaning >> > up >> > > > the >> > > > > > APIs >> > > > > > > and making it consistent in all the pushdown APIs. >> > > > > > > >> > > > > > > Your suggested approach seems fine to me, unless anyone else >> has >> > > any >> > > > > > other >> > > > > > > concerns. Just have couple of clarifying questions: >> > > > > > > >> > > > > > > 1. Do you think we should standardize the APIs across all the >> > > > pushdown >> > > > > > > supports like SupportsPartitionPushdown, >> SupportsDynamicFiltering >> > > etc >> > > > > in >> > > > > > > the end state? >> > > > > > > >> > > > > > > The current proposal works if we do not want to migrate >> > > > > > > > SupportsFilterPushdown to also use >> > NestedFieldReferenceExpression >> > > > in >> > > > > > the >> > > > > > > > long term. >> > > > > > > > >> > > > > > > Did you mean *FieldReferenceExpression* instead of >> > > > > > > *NestedFieldReferenceExpression*? >> > > > > > > >> > > > > > > 2. Extend the FieldReferenceExpression to support nested >> fields. >> > > > > > > > - Change the index field type from int to int[]. >> > > > > > > >> > > > > > > - Add a new method int[] getFieldIndexArray(). >> > > > > > > > - Deprecate the int getFieldIndex() method, the code >> will >> > be >> > > > > > removed >> > > > > > > in >> > > > > > > > the next major version bump. >> > > > > > > >> > > > > > > I assume getFieldIndex would return fieldIndexArray[0], right? >> > > > > > > >> > > > > > > Thanks >> > > > > > > Venkat >> > > > > > > >> > > > > > > On Fri, Aug 18, 2023 at 4:47 PM Becket Qin < >> becket....@gmail.com >> > > >> > > > > wrote: >> > > > > > > >> > > > > > > > Thanks for the proposal, Venkata. >> > > > > > > > >> > > > > > > > The current proposal works if we do not want to migrate >> > > > > > > > SupportsFilterPushdown to also use >> > NestedFieldReferenceExpression >> > > > in >> > > > > > the >> > > > > > > > long term. >> > > > > > > > >> > > > > > > Did you mean *FieldReferenceExpression* instead of >> > > > > > > *NestedFieldReferenceExpression*? >> > > > > > > >> > > > > > > > >> > > > > > > > Otherwise, the alternative solution briefly mentioned in the >> > > > rejected >> > > > > > > > alternatives would be the following: >> > > > > > > > Phase 1: >> > > > > > > > 1. Introduce a supportsNestedFilters() method to the >> > > > > > > SupportsFilterPushdown >> > > > > > > > interface. (same as current proposal). >> > > > > > > > 2. Extend the FieldReferenceExpression to support nested >> > fields. >> > > > > > > > - Change the index field type from int to int[]. >> > > > > > > >> > > > > > > - Add a new method int[] getFieldIndexArray(). >> > > > > > > > - Deprecate the int getFieldIndex() method, the code >> will >> > be >> > > > > > removed >> > > > > > > in >> > > > > > > > the next major version bump. >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > 3. In the SupportsProjectionPushDown interface >> > > > > > > > - add a new method >> > > > > applyProjection(List<FieldReferenceExpression>, >> > > > > > > > DataType), with default implementation invoking >> > > > > > applyProjection(int[][], >> > > > > > > > DataType) >> > > > > > > > - deprecate the current applyProjection(int[][], >> DataType) >> > > > method >> > > > > > > > >> > > > > > > > Phase 2 (in the next major version bump) >> > > > > > > > 1. remove the deprecated methods. >> > > > > > > > >> > > > > > > > Phase 3 (optional) >> > > > > > > > 1. deprecate and remove the supportsNestedFilters() / >> > > > > > > > supportsNestedProjection() methods from the >> > > SupportsFilterPushDown >> > > > / >> > > > > > > > SupportsProjectionPushDown interfaces. >> > > > > > > > >> > > > > > > > Personally I prefer this alternative. It takes longer to >> finish >> > > the >> > > > > > work, >> > > > > > > > but the API eventually becomes clean and consistent. But I >> can >> > > live >> > > > > > with >> > > > > > > > the current proposal. >> > > > > > > > >> > > > > > > > Thanks, >> > > > > > > > >> > > > > > > > Jiangjie (Becket) Qin >> > > > > > > > >> > > > > > > > On Sat, Aug 19, 2023 at 12:09 AM Venkatakrishnan Sowrirajan >> < >> > > > > > > > vsowr...@asu.edu> wrote: >> > > > > > > > >> > > > > > > > > Gentle ping for reviews/feedback. >> > > > > > > > > >> > > > > > > > > On Tue, Aug 15, 2023, 5:37 PM Venkatakrishnan Sowrirajan < >> > > > > > > > vsowr...@asu.edu >> > > > > > > > > > >> > > > > > > > > wrote: >> > > > > > > > > >> > > > > > > > > > Hi All, >> > > > > > > > > > >> > > > > > > > > > I am opening this thread to discuss FLIP-356: Support >> > Nested >> > > > > Fields >> > > > > > > > > > Filter Pushdown. The FLIP can be found at >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> https://urldefense.com/v3/__https://cwiki.apache.org/confluence/display/FLINK/FLIP-356*3A*Support*Nested*Fields*Filter*Pushdown__;JSsrKysr!!IKRxdwAv5BmarQ!clxXJwshKpn559SAkQiieqgGe0ZduXCzUKCmYLtFIbQLmrmEEgdmuEIM8ZM1M3O_uGqOploU4ailqGpukAg$ >> > > > > > > > > > >> > > > > > > > > > This FLIP adds support for pushing down nested fields >> > filters >> > > > to >> > > > > > the >> > > > > > > > > > underlying TableSource. In our data lake, we find a lot >> of >> > > > > datasets >> > > > > > > > have >> > > > > > > > > > nested fields and also user queries with filters >> defined on >> > > the >> > > > > > > nested >> > > > > > > > > > fields. This would drastically improve the performance >> for >> > > > those >> > > > > > sets >> > > > > > > > of >> > > > > > > > > > queries. >> > > > > > > > > > >> > > > > > > > > > Appreciate any comments or feedback you may have on this >> > > > > proposal. >> > > > > > > > > > >> > > > > > > > > > Regards >> > > > > > > > > > Venkata krishnan >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> >