Hi Jinsong, Thanks for your comments.
What is inputIndex in NestedFieldReferenceExpression? I haven't looked at it before. Do you mean, given that it is now only used to push filters it won't be subsequently used in further planning/optimization and therefore it is not required at this time? So if NestedFieldReferenceExpression doesn't need inputIndex, is there > a need to introduce a base class `ReferenceExpression`? For SupportsFilterPushDown itself, *ReferenceExpression* base class is not needed. But there were discussions around cleaning up and standardizing the API for Supports*PushDown. SupportsProjectionPushDown currently pushes the projects as a 2-d array, instead it would be better to use the standard API which seems to be the *ResolvedExpression*. For SupportsProjectionPushDown either FieldReferenceExpression (top level fields) or NestedFieldReferenceExpression (nested fields) is enough, in order to provide a single API that handles both top level and nested fields, ReferenceExpression will be introduced as a base class. Eventually, *SupportsProjectionPushDown#applyProjections* would evolve as applyProjection(List<ReferenceExpression> projectedFields) and nested fields would be pushed only if *supportsNestedProjections* returns true. Regards Venkata krishnan On Sun, Aug 27, 2023 at 11:12 PM Jingsong Li <jingsongl...@gmail.com> wrote: > So if NestedFieldReferenceExpression doesn't need inputIndex, is there > a need to introduce a base class `ReferenceExpression`? > > Best, > Jingsong > > On Mon, Aug 28, 2023 at 2:09 PM Jingsong Li <jingsongl...@gmail.com> > wrote: > > > > Hi thanks all for your discussion. > > > > What is inputIndex in NestedFieldReferenceExpression? > > > > I know inputIndex has special usage in FieldReferenceExpression, but > > it is only for Join operators, and it is only for SQL optimization. It > > looks like there is no requirement for Nested. > > > > Best, > > Jingsong > > > > On Mon, Aug 28, 2023 at 1:13 PM Venkatakrishnan Sowrirajan > > <vsowr...@asu.edu> wrote: > > > > > > Thanks for all the feedback and discussion everyone. Looks like we have > > > reached a consensus here. > > > > > > Just to summarize: > > > > > > 1. Introduce a new *ReferenceExpression* (or *BaseReferenceExpression*) > > > abstract class which will be extended by both > *FieldReferenceExpression* > > > and *NestedFieldReferenceExpression* (to be introduced as part of this > FLIP) > > > 2. No need of *supportsNestedFilters *check as the current > > > *SupportsFilterPushDown* should already ignore unknown expressions ( > > > *NestedFieldReferenceExpression* for example) and return them as > > > *remainingFilters. > > > *Maybe this should be clarified explicitly in the Javadoc of > > > *SupportsFilterPushDown. > > > *I will file a separate JIRA to fix the documentation. > > > 3. Refactor *SupportsProjectionPushDown* to use *ReferenceExpression > *instead > > > of existing 2-d arrays to consolidate and be consistent with other > > > Supports*PushDown APIs - *outside the scope of this FLIP* > > > 4. Similarly *SupportsAggregatePushDown* should also be evolved > whenever > > > nested fields support is added to use the *ReferenceExpression - > **outside > > > the scope of this FLIP* > > > > > > Does this sound good? Please let me know if I have missed anything > here. If > > > there are no concerns, I will start a vote tomorrow. I will also get > the > > > FLIP-356 wiki updated. Thanks everyone once again! > > > > > > Regards > > > Venkata krishnan > > > > > > > > > On Thu, Aug 24, 2023 at 8:19 PM Becket Qin <becket....@gmail.com> > wrote: > > > > > > > Hi Jark, > > > > > > > > How about having a separate NestedFieldReferenceExpression, and > > > > > abstracting a common base class "ReferenceExpression" for > > > > > NestedFieldReferenceExpression and FieldReferenceExpression? This > makes > > > > > unifying expressions in > > > > > > "SupportsProjectionPushdown#applyProjections(List<ReferenceExpression> > > > > > ...)" > > > > > possible. > > > > > > > > > > > > I'd be fine with this. It at least provides a consistent API style / > > > > formality. > > > > > > > > Re: Yunhong, > > > > > > > > 3. Finally, I think we need to look at the costs and benefits of > unifying > > > > > the SupportsFilterPushDown and SupportsProjectionPushDown (or > others) > > > > from > > > > > the perspective of interface implementers. A stable API can reduce > user > > > > > development and change costs, if the current API can fully meet the > > > > > functional requirements at the framework level, I personal suggest > > > > reducing > > > > > the impact on connector developers. > > > > > > > > > > > > > I agree that the cost and benefit should be measured. And the > measurement > > > > should be in the long term instead of short term. That is why we > always > > > > need to align on the ideal end state first. > > > > Meeting functionality requirements is the bare minimum bar for an > API. > > > > Simplicity, intuitiveness, robustness and evolvability are also > important. > > > > In addition, for projects with many APIs, such as Flink, a > consistent API > > > > style is also critical for the user adoption as well as bug > avoidance. It > > > > is very helpful for the community to agree on some API design > conventions / > > > > principles. > > > > For example, in this particular case, via our discussion, hopefully > we sort > > > > of established the following API design conventions / principles for > all > > > > the Supports*PushDown interfaces. > > > > > > > > 1. By default, expressions should be used if applicable instead of > other > > > > representations. > > > > 2. In general, the pushdown method should not assume all the > pushdowns will > > > > succeed. So the applyX() method should return a boolean or List<X>, > to > > > > handle the cases that some of the pushdowns cannot be fulfilled by > the > > > > implementation. > > > > > > > > Establishing such conventions and principles demands careful > thinking for > > > > the aspects I mentioned earlier in addition to the API > functionalities. > > > > This helps lower the bar of understanding, reduces the chance of > having > > > > loose ends in the API, and will benefit all the participants in the > project > > > > over time. I think this is the right way to achieve real API > stability. > > > > Otherwise, we may end up chasing our tails to find ways not to > change the > > > > existing non-ideal APIs. > > > > > > > > Thanks, > > > > > > > > Jiangjie (Becket) Qin > > > > > > > > On Fri, Aug 25, 2023 at 9:33 AM yh z <zhengyunhon...@gmail.com> > wrote: > > > > > > > > > Hi, Venkat, > > > > > > > > > > Thanks for the FLIP, it sounds good to support nested fields filter > > > > > pushdown. Based on the design of flip and the above options, I > would like > > > > > to make a few suggestions: > > > > > > > > > > 1. At present, introducing NestedFieldReferenceExpression looks > like a > > > > > better solution, which can fully meet our requirements while > reducing > > > > > modifications to base class FieldReferenceExpression. In the long > run, I > > > > > tend to abstract a basic class for NestedFieldReferenceExpression > and > > > > > FieldReferenceExpression as u suggested. > > > > > > > > > > 2. Personally, I don't recommend introducing > *supportsNestedFilters() in > > > > > supportsFilterPushdown. We just need to better declare the return > value > > > > of > > > > > the method *applyFilters. > > > > > > > > > > 3. Finally, I think we need to look at the costs and benefits of > unifying > > > > > the SupportsFilterPushDown and SupportsProjectionPushDown (or > others) > > > > from > > > > > the perspective of interface implementers. A stable API can reduce > user > > > > > development and change costs, if the current API can fully meet the > > > > > functional requirements at the framework level, I personal suggest > > > > reducing > > > > > the impact on connector developers. > > > > > > > > > > Regards, > > > > > Yunhong Zheng (Swuferhong) > > > > > > > > > > > > > > > Venkatakrishnan Sowrirajan <vsowr...@asu.edu> 于2023年8月25日周五 > 01:25写道: > > > > > > > > > > > To keep it backwards compatible, introduce another API > *applyAggregates > > > > > > *with > > > > > > *List<ReferenceExpression> *when nested field support is added > and > > > > > > deprecate the current API. This will by default throw an > exception. In > > > > > > flink planner, *applyAggregates *with nested fields and if it > throws > > > > > > exception then *applyAggregates* without nested fields. > > > > > > > > > > > > Regards > > > > > > Venkata krishnan > > > > > > > > > > > > > > > > > > On Thu, Aug 24, 2023 at 10:13 AM Venkatakrishnan Sowrirajan < > > > > > > vsowr...@asu.edu> wrote: > > > > > > > > > > > > > Jark, > > > > > > > > > > > > > > How about having a separate NestedFieldReferenceExpression, and > > > > > > >> abstracting a common base class "ReferenceExpression" for > > > > > > >> NestedFieldReferenceExpression and FieldReferenceExpression? > This > > > > > makes > > > > > > >> unifying expressions in > > > > > > >> > > > > > "SupportsProjectionPushdown#applyProjections(List<ReferenceExpression> > > > > > > >> ...)" > > > > > > >> possible. > > > > > > > > > > > > > > This should be fine for *SupportsProjectionPushDown* and > > > > > > > *SupportsFilterPushDown*. One concern in the case of > > > > > > > *SupportsAggregatePushDown* with nested fields support (to be > added > > > > in > > > > > > > the future), with this proposal, the API will become backwards > > > > > > incompatible > > > > > > > as the *args *for the aggregate function is > > > > > > *List<FieldReferenceExpression> > > > > > > > *that needs to change to *List<ReferenceExpression>*. > > > > > > > > > > > > > > Regards > > > > > > > Venkata krishnan > > > > > > > > > > > > > > > > > > > > > On Thu, Aug 24, 2023 at 1:18 AM Jark Wu <imj...@gmail.com> > wrote: > > > > > > > > > > > > > >> Hi Becket, > > > > > > >> > > > > > > >> I think it is the second case, that a > FieldReferenceExpression is > > > > > > >> constructed > > > > > > >> by the framework and passed to the connector (interfaces > listed by > > > > > > >> Venkata[1] > > > > > > >> and Catalog#listPartitionsByFilter). Besides, understanding > the > > > > nested > > > > > > >> field > > > > > > >> is optional for users/connectors (just treat it as an unknown > > > > > expression > > > > > > >> if > > > > > > >> the > > > > > > >> connector doesn't want to support it). > > > > > > >> > > > > > > >> If we extend FieldReferenceExpression, in the case of "where > > > > > col.nested > > > > > > > > > > > > > >> 10", > > > > > > >> for the connectors already supported filter/delete pushdown, > they > > > > may > > > > > > >> wrongly > > > > > > >> pushdown "col > 10" instead of "nested > 10" because they > still > > > > treat > > > > > > >> FieldReferenceExpression as a top-level column. This problem > can be > > > > > > >> resolved > > > > > > >> by introducing an additional "supportedNestedPushdown" for > each > > > > > > interface, > > > > > > >> but that method is not elegant and is hard to remove in the > future, > > > > > and > > > > > > >> this could > > > > > > >> be avoided if we have a separate > NestedFieldReferenceExpression. > > > > > > >> > > > > > > >> If we want to extend FieldReferenceExpression, we have to add > > > > > > protections > > > > > > >> for every related API in one shot. Besides, > FieldReferenceExpression > > > > > is > > > > > > a > > > > > > >> fundamental class in the planner, we have to go through all > the code > > > > > > that > > > > > > >> is using it to make sure it properly handling it if it is a > nested > > > > > field > > > > > > >> which > > > > > > >> is a big effort for the community. > > > > > > >> > > > > > > >> If we were designing this API on day 1, I fully support > merging them > > > > > in > > > > > > a > > > > > > >> FieldReferenceExpression. But in this case, I'm thinking > about how > > > > to > > > > > > >> provide > > > > > > >> users with a smooth migration path, and allow the community to > > > > > gradually > > > > > > >> put efforts into evolving the API, and not block the "Nested > Fields > > > > > > Filter > > > > > > >> Pushdown" > > > > > > >> requirement. > > > > > > >> > > > > > > >> How about having a separate NestedFieldReferenceExpression, > and > > > > > > >> abstracting a common base class "ReferenceExpression" for > > > > > > >> NestedFieldReferenceExpression and FieldReferenceExpression? > This > > > > > makes > > > > > > >> unifying expressions in > > > > > > >> > > > > > "SupportsProjectionPushdown#applyProjections(List<ReferenceExpression> > > > > > > >> ...)" > > > > > > >> possible. > > > > > > >> > > > > > > >> Best, > > > > > > >> Jark > > > > > > >> > > > > > > >> On Thu, 24 Aug 2023 at 07:00, Venkatakrishnan Sowrirajan < > > > > > > >> vsowr...@asu.edu> > > > > > > >> wrote: > > > > > > >> > > > > > > >> > Becket and Jark, > > > > > > >> > > > > > > > >> > Deprecate all the other > > > > > > >> > > methods except tryApplyFilters() and > tryApplyProjections(). > > > > > > >> > > > > > > > >> > For *SupportsProjectionPushDown*, we still need a > > > > > > >> > *supportsNestedProjections* API on the table source as some > of the > > > > > > table > > > > > > >> > sources might not be able to handle nested fields and > therefore > > > > the > > > > > > >> Flink > > > > > > >> > planner should not push down the nested projections or else > the > > > > > > >> > *applyProjection > > > > > > >> > *API has to be appropriately changed to return > > > > > > >> > *unconvertibleProjections *similar > > > > > > >> > to *SupportsFilterPushDown*. > > > > > > >> > > > > > > > >> > Or we have to introduce two different applyProjections() > > > > > > >> > > methods for FieldReferenceExpression / > > > > > > NestedFieldReferenceExpression > > > > > > >> > > respectively. > > > > > > >> > > > > > > > >> > Agree this is not preferred. Given that > *supportNestedProjections > > > > > > >> *cannot > > > > > > >> > be deprecated/removed based on the current API form, > extending > > > > > > >> > *FieldReferenceExpression* to support nested fields should > be > > > > okay. > > > > > > >> > > > > > > > >> > Another alternative could be to change *applyProjections > *to take > > > > > > >> > List<ResolvedExpression> and on the connector side they > choose to > > > > > > handle > > > > > > >> > *FieldReferenceExpression* and > *NestedFieldReferenceExpression *as > > > > > > >> > applicable and return the remainingProjections. In the case > of > > > > > nested > > > > > > >> field > > > > > > >> > projections not supported, it should return them back but > only > > > > > > >> projecting > > > > > > >> > the top level fields. IMO, this is also *not preferred*. > > > > > > >> > > > > > > > >> > *SupportsAggregatePushDown* > > > > > > >> > > > > > > > >> > *AggregateExpression *currently takes in a list of > > > > > > >> > *FieldReferenceExpression* as args for the aggregate > function, if > > > > in > > > > > > >> future > > > > > > >> > *SupportsAggregatePushDown* adds support for aggregate > pushdown on > > > > > > >> nested > > > > > > >> > fields then the AggregateExpression API also has to change > if a > > > > new > > > > > > >> > NestedFieldReferenceExpression is introduced for nested > fields. > > > > > > >> > > > > > > > >> > If we add a > > > > > > >> > > flag for each new filter, > > > > > > >> > > the interface will be filled with lots of flags (e.g., > > > > > > >> supportsBetween, > > > > > > >> > > supportsIN) > > > > > > >> > > > > > > > >> > In an ideal situation, I completely agree with you. But in > the > > > > > current > > > > > > >> > state, *supportsNestedFilters* can act as a bridge to reach > the > > > > > > eventual > > > > > > >> > desired state which is to have a clean and consistent set > of APIs > > > > > > >> > throughout all Supports*PushDown. > > > > > > >> > > > > > > > >> > Also shared some thoughts on the end state API > > > > > > >> > < > > > > > > >> > > > > > > > >> > > > > > > > > > > > > > > > > https://urldefense.com/v3/__https://docs.google.com/document/d/1stLRPKOcxlEv8eHblkrOh0Zf5PLM-h76WMhEINHOyPY/edit?usp=sharing__;!!IKRxdwAv5BmarQ!ZZ2nS1PYlXLnEGFcikS3NsYG7tMaV3wU_z7FmvihNwQBmoLZk2WmcpuRWszK0FFmsInh9A6cndkJrQ$ > > > > > > >> > > > > > > > > >> > with extension to the *FieldReferenceExpression* to support > nested > > > > > > >> fields. > > > > > > >> > Please take a look. > > > > > > >> > > > > > > > >> > Regards > > > > > > >> > Venkata krishnan > > > > > > >> > > > > > > > >> > On Tue, Aug 22, 2023 at 5:02 PM Becket Qin < > becket....@gmail.com> > > > > > > >> wrote: > > > > > > >> > > > > > > > >> > > Hi Jark, > > > > > > >> > > > > > > > > >> > > Regarding the migration path, it would be useful to > scrutinize > > > > the > > > > > > use > > > > > > >> > case > > > > > > >> > > of FiledReferenceExpression and ResolvedExpressions. > There are > > > > two > > > > > > >> kinds > > > > > > >> > of > > > > > > >> > > use cases: > > > > > > >> > > > > > > > > >> > > 1. A ResolvedExpression is constructed by the user or > connector > > > > / > > > > > > >> plugin > > > > > > >> > > developers. > > > > > > >> > > 2. A ResolvedExpression is constructed by the framework > and > > > > passed > > > > > > to > > > > > > >> > user > > > > > > >> > > or connector / plugin developers. > > > > > > >> > > > > > > > > >> > > For the first case, both of the approaches provide the > same > > > > > > migration > > > > > > >> > > experience. > > > > > > >> > > > > > > > > >> > > For the second case, generally speaking, introducing > > > > > > >> > > NestedFieldReferenceExpression and extending > > > > > > FieldReferenceExpression > > > > > > >> > would > > > > > > >> > > have the same impact for backwards compatibility. > > > > > > >> SupportsFilterPushDown > > > > > > >> > is > > > > > > >> > > a special case here because understanding the filter > expressions > > > > > is > > > > > > >> > > optional for the source implementation. In other use > cases, if > > > > > > >> > > understanding the reference to a nested field is a must > have, > > > > the > > > > > > user > > > > > > >> > code > > > > > > >> > > has to be changed, regardless of which approach we take to > > > > support > > > > > > >> nested > > > > > > >> > > fields. > > > > > > >> > > > > > > > > >> > > Therefore, I think we have to check each public API where > the > > > > > nested > > > > > > >> > field > > > > > > >> > > reference is exposed. If we have many public APIs where > > > > > > understanding > > > > > > >> > > nested fields is optional for the user / plugin / > connector > > > > > > >> developers, > > > > > > >> > > having a separate NestedFieldReferenceExpression would > have a > > > > more > > > > > > >> smooth > > > > > > >> > > migration. Otherwise, there seems to be no difference > between > > > > the > > > > > > two > > > > > > >> > > approaches. > > > > > > >> > > > > > > > > >> > > Migration path aside, the main reason I prefer extending > > > > > > >> > > FieldReferenceExpression over a new > > > > NestedFieldReferenceExpression > > > > > > is > > > > > > >> > > because this makes the SupportsProjectionPushDown > interface > > > > > simpler. > > > > > > >> > > Otherwise, we have to treat it as a special case that > does not > > > > > match > > > > > > >> the > > > > > > >> > > overall API style. Or we have to introduce two different > > > > > > >> > applyProjections() > > > > > > >> > > methods for FieldReferenceExpression / > > > > > > NestedFieldReferenceExpression > > > > > > >> > > respectively. This issue further extends to > implementation in > > > > > > >> addition to > > > > > > >> > > public API. A single FieldReferenceExpression might help > > > > simplify > > > > > > the > > > > > > >> > > implementation code a little bit. For example, in a > recursive > > > > > > >> processing > > > > > > >> > of > > > > > > >> > > a row with nested rows, we may not need to switch between > > > > > > >> > > FieldReferenceExpression and > NestedFieldReferenceExpression > > > > > > depending > > > > > > >> on > > > > > > >> > > whether the record being processed is a top level record > or > > > > nested > > > > > > >> > record. > > > > > > >> > > > > > > > > >> > > Thanks, > > > > > > >> > > > > > > > > >> > > Jiangjie (Becket) Qin > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > On Tue, Aug 22, 2023 at 11:43 PM Jark Wu < > imj...@gmail.com> > > > > > wrote: > > > > > > >> > > > > > > > > >> > > > Hi Becket, > > > > > > >> > > > > > > > > > >> > > > I totally agree we should try to have a consistent API > for a > > > > > final > > > > > > >> > state. > > > > > > >> > > > The only concern I have mentioned is the "smooth" > migration > > > > > path. > > > > > > >> > > > The FiledReferenceExpression is widely used in many > public > > > > APIs, > > > > > > >> > > > not only in the SupportsFilterPushDown. Yes, we can > change > > > > every > > > > > > >> > > > methods in 2-steps, but is it good to change API back > and > > > > forth > > > > > > for > > > > > > >> > this? > > > > > > >> > > > Personally, I'm fine with a separate > > > > > > NestedFieldReferenceExpression > > > > > > >> > > class. > > > > > > >> > > > TBH, I prefer the separated way because it makes the > reference > > > > > > >> > expression > > > > > > >> > > > more clear and concise. > > > > > > >> > > > > > > > > > >> > > > Best, > > > > > > >> > > > Jark > > > > > > >> > > > > > > > > > >> > > > > > > > > > >> > > > On Tue, 22 Aug 2023 at 16:53, Becket Qin < > > > > becket....@gmail.com> > > > > > > >> wrote: > > > > > > >> > > > > > > > > > >> > > > > Thanks for the reply, Jark. > > > > > > >> > > > > > > > > > > >> > > > > I think it will be helpful to understand the final > state we > > > > > want > > > > > > >> to > > > > > > >> > > > > eventually achieve first, then we can discuss the > steps > > > > > towards > > > > > > >> that > > > > > > >> > > > final > > > > > > >> > > > > state. > > > > > > >> > > > > > > > > > > >> > > > > It looks like there are two proposed end states now: > > > > > > >> > > > > > > > > > > >> > > > > 1. Have a separate NestedFieldReferenceExpression > class; > > > > keep > > > > > > >> > > > > SupportsFilterPushDown and SupportsProjectionPushDown > the > > > > > same. > > > > > > >> It is > > > > > > >> > > > just > > > > > > >> > > > > a one step change. > > > > > > >> > > > > - Regarding the supportsNestedFilterPushDown() > method, if > > > > > our > > > > > > >> > > contract > > > > > > >> > > > > with the connector developer today is "The > implementation > > > > > should > > > > > > >> > ignore > > > > > > >> > > > > unrecognized expressions by putting them into the > remaining > > > > > > >> filters, > > > > > > >> > > > > instead of throwing exceptions". Then there is no > need for > > > > > this > > > > > > >> > > method. I > > > > > > >> > > > > am not sure about the current contract. We should > probably > > > > > make > > > > > > it > > > > > > >> > > clear > > > > > > >> > > > in > > > > > > >> > > > > the interface Java doc. > > > > > > >> > > > > > > > > > > >> > > > > 2. Extend the existing FiledReferenceExpression class > to > > > > > support > > > > > > >> > nested > > > > > > >> > > > > fields; SupportsFilterPushDown only has one method of > > > > > > >> > > > > applyFilters(List<ResolvedExpression>); > > > > > > SupportsProjectionPushDown > > > > > > >> > only > > > > > > >> > > > has > > > > > > >> > > > > one method of > > > > applyProjections(List<FieldReferenceExpression>, > > > > > > >> > > DataType). > > > > > > >> > > > > It could just be two steps if we are not too obsessed > with > > > > the > > > > > > >> exact > > > > > > >> > > > names > > > > > > >> > > > > of "applyFilters" and "applyProjections". More > specifically, > > > > > it > > > > > > >> takes > > > > > > >> > > two > > > > > > >> > > > > steps to achieve this final state: > > > > > > >> > > > > a. introduce a new method > > > > > > >> > tryApplyFilters(List<ResolvedExpression>) > > > > > > >> > > > to > > > > > > >> > > > > SupportsFilterPushDown, which may have > > > > > FiledReferenceExpression > > > > > > >> with > > > > > > >> > > > nested > > > > > > >> > > > > fields. The default implementation throws an > exception. The > > > > > > >> runtime > > > > > > >> > > will > > > > > > >> > > > > first call tryApplyFilters() with nested fields. In > case of > > > > > > >> > exception, > > > > > > >> > > it > > > > > > >> > > > > calls the existing applyFilters() without including > the > > > > nested > > > > > > >> > filters. > > > > > > >> > > > > Similarly, in SupportsProjectionPushDown, introduce a > > > > > > >> > > > > tryApplyProjections<List<NestedFieldReference> method > > > > > returning > > > > > > a > > > > > > >> > > Result. > > > > > > >> > > > > The Result also contains the accepted and unapplicable > > > > > > >> projections. > > > > > > >> > The > > > > > > >> > > > > default implementation also throws an exception. > Deprecate > > > > all > > > > > > the > > > > > > >> > > other > > > > > > >> > > > > methods except tryApplyFilters() and > tryApplyProjections(). > > > > > > >> > > > > b. remove the deprecated methods in the next major > > > > version > > > > > > >> bump. > > > > > > >> > > > > > > > > > > >> > > > > Now the question is putting the migration steps > aside, which > > > > > end > > > > > > >> > state > > > > > > >> > > do > > > > > > >> > > > > we prefer? While the first end state is acceptable > for me, > > > > > > >> > personally, > > > > > > >> > > I > > > > > > >> > > > > prefer the latter if we are designing from scratch. > It is > > > > > clean, > > > > > > >> > > > consistent > > > > > > >> > > > > and intuitive. Given the size of Flink, keeping APIs > in the > > > > > same > > > > > > >> > style > > > > > > >> > > > over > > > > > > >> > > > > time is important. The migration is also not that > > > > complicated. > > > > > > >> > > > > > > > > > > >> > > > > Thanks, > > > > > > >> > > > > > > > > > > >> > > > > Jiangjie (Becket) Qin > > > > > > >> > > > > > > > > > > >> > > > > > > > > > > >> > > > > On Tue, Aug 22, 2023 at 2:23 PM Jark Wu < > imj...@gmail.com> > > > > > > wrote: > > > > > > >> > > > > > > > > > > >> > > > > > Hi Venkat, > > > > > > >> > > > > > > > > > > > >> > > > > > Thanks for the proposal. > > > > > > >> > > > > > > > > > > > >> > > > > > I have some minor comments about the FLIP. > > > > > > >> > > > > > > > > > > > >> > > > > > 1. I think we don't need to > > > > > > >> > > > > > add SupportsFilterPushDown#supportsNestedFilters() > method, > > > > > > >> > > > > > because connectors can skip nested filters by > putting them > > > > > in > > > > > > >> > > > > > Result#remainingFilters(). > > > > > > >> > > > > > And this is backward-compatible because unknown > > > > expressions > > > > > > were > > > > > > >> > > added > > > > > > >> > > > to > > > > > > >> > > > > > the remaining filters. > > > > > > >> > > > > > Planner should push predicate expressions as more as > > > > > possible. > > > > > > >> If > > > > > > >> > we > > > > > > >> > > > add > > > > > > >> > > > > a > > > > > > >> > > > > > flag for each new filter, > > > > > > >> > > > > > the interface will be filled with lots of flags > (e.g., > > > > > > >> > > supportsBetween, > > > > > > >> > > > > > supportsIN). > > > > > > >> > > > > > > > > > > > >> > > > > > 2. NestedFieldReferenceExpression#nestedFieldName > should > > > > be > > > > > an > > > > > > >> > array > > > > > > >> > > of > > > > > > >> > > > > > field names? > > > > > > >> > > > > > Each string represents a field name part of the > field > > > > path. > > > > > > Just > > > > > > >> > keep > > > > > > >> > > > > > aligning with `nestedFieldIndexArray`. > > > > > > >> > > > > > > > > > > > >> > > > > > 3. My concern about making FieldReferenceExpression > > > > support > > > > > > >> nested > > > > > > >> > > > fields > > > > > > >> > > > > > is the compatibility. > > > > > > >> > > > > > It is a public API and users/connectors are already > using > > > > > it. > > > > > > >> > People > > > > > > >> > > > > > assumed it is a top-level column > > > > > > >> > > > > > reference, and applied logic on it. But that's not > true > > > > now > > > > > > and > > > > > > >> > this > > > > > > >> > > > may > > > > > > >> > > > > > lead to unexpected errors. > > > > > > >> > > > > > Having a separate NestedFieldReferenceExpression > sounds > > > > > safer > > > > > > to > > > > > > >> > me. > > > > > > >> > > > > Mixing > > > > > > >> > > > > > them in a class may > > > > > > >> > > > > > confuse users what's the meaning of getFieldName() > and > > > > > > >> > > > getFieldIndex(). > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > Regarding using NestedFieldReferenceExpression in > > > > > > >> > > > > > SupportsProjectionPushDown, do you > > > > > > >> > > > > > have any concerns @Timo Walther <twal...@apache.org> > ? > > > > > > >> > > > > > > > > > > > >> > > > > > Best, > > > > > > >> > > > > > Jark > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > On Tue, 22 Aug 2023 at 05:55, Venkatakrishnan > Sowrirajan < > > > > > > >> > > > > vsowr...@asu.edu > > > > > > >> > > > > > > > > > > > > >> > > > > > wrote: > > > > > > >> > > > > > > > > > > > >> > > > > > > Sounds like a great suggestion, Becket. +1. Agree > with > > > > > > >> cleaning > > > > > > >> > up > > > > > > >> > > > the > > > > > > >> > > > > > APIs > > > > > > >> > > > > > > and making it consistent in all the pushdown APIs. > > > > > > >> > > > > > > > > > > > > >> > > > > > > Your suggested approach seems fine to me, unless > anyone > > > > > else > > > > > > >> has > > > > > > >> > > any > > > > > > >> > > > > > other > > > > > > >> > > > > > > concerns. Just have couple of clarifying > questions: > > > > > > >> > > > > > > > > > > > > >> > > > > > > 1. Do you think we should standardize the APIs > across > > > > all > > > > > > the > > > > > > >> > > > pushdown > > > > > > >> > > > > > > supports like SupportsPartitionPushdown, > > > > > > >> SupportsDynamicFiltering > > > > > > >> > > etc > > > > > > >> > > > > in > > > > > > >> > > > > > > the end state? > > > > > > >> > > > > > > > > > > > > >> > > > > > > The current proposal works if we do not want to > migrate > > > > > > >> > > > > > > > SupportsFilterPushdown to also use > > > > > > >> > NestedFieldReferenceExpression > > > > > > >> > > > in > > > > > > >> > > > > > the > > > > > > >> > > > > > > > long term. > > > > > > >> > > > > > > > > > > > > > >> > > > > > > Did you mean *FieldReferenceExpression* instead of > > > > > > >> > > > > > > *NestedFieldReferenceExpression*? > > > > > > >> > > > > > > > > > > > > >> > > > > > > 2. Extend the FieldReferenceExpression to support > nested > > > > > > >> fields. > > > > > > >> > > > > > > > - Change the index field type from int to > int[]. > > > > > > >> > > > > > > > > > > > > >> > > > > > > - Add a new method int[] getFieldIndexArray(). > > > > > > >> > > > > > > > - Deprecate the int getFieldIndex() method, > the > > > > code > > > > > > >> will > > > > > > >> > be > > > > > > >> > > > > > removed > > > > > > >> > > > > > > in > > > > > > >> > > > > > > > the next major version bump. > > > > > > >> > > > > > > > > > > > > >> > > > > > > I assume getFieldIndex would return > fieldIndexArray[0], > > > > > > right? > > > > > > >> > > > > > > > > > > > > >> > > > > > > Thanks > > > > > > >> > > > > > > Venkat > > > > > > >> > > > > > > > > > > > > >> > > > > > > On Fri, Aug 18, 2023 at 4:47 PM Becket Qin < > > > > > > >> becket....@gmail.com > > > > > > >> > > > > > > > > >> > > > > wrote: > > > > > > >> > > > > > > > > > > > > >> > > > > > > > Thanks for the proposal, Venkata. > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > The current proposal works if we do not want to > > > > migrate > > > > > > >> > > > > > > > SupportsFilterPushdown to also use > > > > > > >> > NestedFieldReferenceExpression > > > > > > >> > > > in > > > > > > >> > > > > > the > > > > > > >> > > > > > > > long term. > > > > > > >> > > > > > > > > > > > > > >> > > > > > > Did you mean *FieldReferenceExpression* instead of > > > > > > >> > > > > > > *NestedFieldReferenceExpression*? > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > Otherwise, the alternative solution briefly > mentioned > > > > in > > > > > > the > > > > > > >> > > > rejected > > > > > > >> > > > > > > > alternatives would be the following: > > > > > > >> > > > > > > > Phase 1: > > > > > > >> > > > > > > > 1. Introduce a supportsNestedFilters() method > to the > > > > > > >> > > > > > > SupportsFilterPushdown > > > > > > >> > > > > > > > interface. (same as current proposal). > > > > > > >> > > > > > > > 2. Extend the FieldReferenceExpression to > support > > > > nested > > > > > > >> > fields. > > > > > > >> > > > > > > > - Change the index field type from int to > int[]. > > > > > > >> > > > > > > > > > > > > >> > > > > > > - Add a new method int[] getFieldIndexArray(). > > > > > > >> > > > > > > > - Deprecate the int getFieldIndex() method, > the > > > > code > > > > > > >> will > > > > > > >> > be > > > > > > >> > > > > > removed > > > > > > >> > > > > > > in > > > > > > >> > > > > > > > the next major version bump. > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > 3. In the SupportsProjectionPushDown interface > > > > > > >> > > > > > > > - add a new method > > > > > > >> > > > > applyProjection(List<FieldReferenceExpression>, > > > > > > >> > > > > > > > DataType), with default implementation invoking > > > > > > >> > > > > > applyProjection(int[][], > > > > > > >> > > > > > > > DataType) > > > > > > >> > > > > > > > - deprecate the current > applyProjection(int[][], > > > > > > >> DataType) > > > > > > >> > > > method > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > Phase 2 (in the next major version bump) > > > > > > >> > > > > > > > 1. remove the deprecated methods. > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > Phase 3 (optional) > > > > > > >> > > > > > > > 1. deprecate and remove the > supportsNestedFilters() / > > > > > > >> > > > > > > > supportsNestedProjection() methods from the > > > > > > >> > > SupportsFilterPushDown > > > > > > >> > > > / > > > > > > >> > > > > > > > SupportsProjectionPushDown interfaces. > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > Personally I prefer this alternative. It takes > longer > > > > to > > > > > > >> finish > > > > > > >> > > the > > > > > > >> > > > > > work, > > > > > > >> > > > > > > > but the API eventually becomes clean and > consistent. > > > > > But I > > > > > > >> can > > > > > > >> > > live > > > > > > >> > > > > > with > > > > > > >> > > > > > > > the current proposal. > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > Thanks, > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > Jiangjie (Becket) Qin > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > On Sat, Aug 19, 2023 at 12:09 AM Venkatakrishnan > > > > > > Sowrirajan > > > > > > >> < > > > > > > >> > > > > > > > vsowr...@asu.edu> wrote: > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > Gentle ping for reviews/feedback. > > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > On Tue, Aug 15, 2023, 5:37 PM Venkatakrishnan > > > > > > Sowrirajan < > > > > > > >> > > > > > > > vsowr...@asu.edu > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > wrote: > > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > Hi All, > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > I am opening this thread to discuss > FLIP-356: > > > > > Support > > > > > > >> > Nested > > > > > > >> > > > > Fields > > > > > > >> > > > > > > > > > Filter Pushdown. The FLIP can be found at > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > > > > > > > > > > > https://urldefense.com/v3/__https://cwiki.apache.org/confluence/display/FLINK/FLIP-356*3A*Support*Nested*Fields*Filter*Pushdown__;JSsrKysr!!IKRxdwAv5BmarQ!clxXJwshKpn559SAkQiieqgGe0ZduXCzUKCmYLtFIbQLmrmEEgdmuEIM8ZM1M3O_uGqOploU4ailqGpukAg$ > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > This FLIP adds support for pushing down > nested > > > > > fields > > > > > > >> > filters > > > > > > >> > > > to > > > > > > >> > > > > > the > > > > > > >> > > > > > > > > > underlying TableSource. In our data lake, > we find > > > > a > > > > > > lot > > > > > > >> of > > > > > > >> > > > > datasets > > > > > > >> > > > > > > > have > > > > > > >> > > > > > > > > > nested fields and also user queries with > filters > > > > > > >> defined on > > > > > > >> > > the > > > > > > >> > > > > > > nested > > > > > > >> > > > > > > > > > fields. This would drastically improve the > > > > > performance > > > > > > >> for > > > > > > >> > > > those > > > > > > >> > > > > > sets > > > > > > >> > > > > > > > of > > > > > > >> > > > > > > > > > queries. > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > Appreciate any comments or feedback you may > have > > > > on > > > > > > this > > > > > > >> > > > > proposal. > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > Regards > > > > > > >> > > > > > > > > > Venkata krishnan > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > >