Hi everyone, Posted a PR (https://github.com/apache/flink/pull/23313) to add nested fields filter pushdown. Please review. Thanks.
Regards Venkata krishnan On Tue, Sep 5, 2023 at 10:04 PM Venkatakrishnan Sowrirajan <vsowr...@asu.edu> wrote: > Based on an offline discussion with Becket Qin, I added *fieldIndices * > back which is the field index of the nested field at every level to the > *NestedFieldReferenceExpression > *in FLIP-356 > <https://cwiki.apache.org/confluence/display/FLINK/FLIP-356%3A+Support+Nested+Fields+Filter+Pushdown> > *. *2 reasons to do it: > > 1. Agree with using *fieldIndices *as the only contract to refer to the > column from the underlying datasource. > 2. To keep it consistent with *FieldReferenceExpression* > > Having said that, I see that with *projection pushdown, *index of the > fields are used whereas with *filter pushdown (*based on scanning few > tablesources) *FieldReferenceExpression*'s name is used for eg: even in > the Flink's *FileSystemTableSource, IcebergSource, JDBCDatsource*. This > way, I feel the contract is not quite clear and explicit. Wanted to > understand other's thoughts as well. > > Regards > Venkata krishnan > > > On Tue, Sep 5, 2023 at 5:34 PM Becket Qin <becket....@gmail.com> wrote: > >> Hi Venkata, >> >> >> > Also I made minor changes to the *NestedFieldReferenceExpression, >> *instead >> > of *fieldIndexArray* we can just do away with *fieldNames *array that >> > includes fieldName at every level for the nested field. >> >> >> I don't think keeping only the field names array would work. At the end of >> the day, the contract between Flink SQL and the connectors is based on the >> indexes, not the names. Technically speaking, the connectors only emit a >> bunch of RowData which is based on positions. The field names are added by >> the SQL framework via the DDL for those RowData. In this sense, the >> connectors may not be aware of the field names in Flink DDL at all. The >> common language between Flink SQL and source is just positions. This is >> also why ProjectionPushDown would work by only relying on the indexes, not >> the field names. So I think the field index array is a must have here in >> the NestedFieldReferenceExpression. >> >> Thanks, >> >> Jiangjie (Becket) Qin >> >> On Fri, Sep 1, 2023 at 8:12 AM Venkatakrishnan Sowrirajan < >> vsowr...@asu.edu> >> wrote: >> >> > Gentle ping on the vote for FLIP-356: Support Nested fields filter >> pushdown >> > < >> https://urldefense.com/v3/__https://www.mail-archive.com/dev@flink.apache.org/msg69289.html__;!!IKRxdwAv5BmarQ!bOW26WlafOQQcb32eWtUiXBAl0cTCK1C6iYhDI2f_z__eczudAWmTRvjDiZg6gzlXmPXrDV4KJS5cFxagFE$ >> >. >> > >> > Regards >> > Venkata krishnan >> > >> > >> > On Tue, Aug 29, 2023 at 9:18 PM Venkatakrishnan Sowrirajan < >> > vsowr...@asu.edu> >> > wrote: >> > >> > > Sure, will reference this discussion to resume where we started as >> part >> > of >> > > the flip to refactor SupportsProjectionPushDown. >> > > >> > > On Tue, Aug 29, 2023, 7:22 PM Jark Wu <imj...@gmail.com> wrote: >> > > >> > >> I'm fine with this. `ReferenceExpression` and >> > `SupportsProjectionPushDown` >> > >> can be another FLIP. However, could you summarize the design of this >> > part >> > >> in the future part of the FLIP? This can be easier to get started >> with >> > in >> > >> the future. >> > >> >> > >> >> > >> Best, >> > >> Jark >> > >> >> > >> >> > >> On Wed, 30 Aug 2023 at 02:45, Venkatakrishnan Sowrirajan < >> > >> vsowr...@asu.edu> >> > >> wrote: >> > >> >> > >> > Thanks Jark. Sounds good. >> > >> > >> > >> > One more thing, earlier in my summary I mentioned, >> > >> > >> > >> > Introduce a new *ReferenceExpression* (or >> *BaseReferenceExpression*) >> > >> > > abstract class which will be extended by both >> > >> *FieldReferenceExpression* >> > >> > > and *NestedFieldReferenceExpression* (to be introduced as part >> of >> > >> this >> > >> > > FLIP) >> > >> > >> > >> > This can be punted for now and can be handled as part of >> refactoring >> > >> > SupportsProjectionPushDown. >> > >> > >> > >> > Also I made minor changes to the *NestedFieldReferenceExpression, >> > >> *instead >> > >> > of *fieldIndexArray* we can just do away with *fieldNames *array >> that >> > >> > includes fieldName at every level for the nested field. >> > >> > >> > >> > Updated the FLIP-357 >> > >> > < >> > >> > >> > >> >> > >> https://urldefense.com/v3/__https://cwiki.apache.org/confluence/display/FLINK/FLIP-356*3A*Support*Nested*Fields*Filter*Pushdown__;JSsrKysr!!IKRxdwAv5BmarQ!YAk6kV4CYvUSPfpoUDQRs6VlbmJXVX8KOKqFxKbNDkUWKzShvwpkLRGkAV1tgV3EqClNrjGS-Ij86Q$ >> > >> > > >> > >> > wiki as well. >> > >> > >> > >> > Regards >> > >> > Venkata krishnan >> > >> > >> > >> > >> > >> > On Tue, Aug 29, 2023 at 5:21 AM Jark Wu <imj...@gmail.com> wrote: >> > >> > >> > >> > > Hi Venkata, >> > >> > > >> > >> > > Your summary looks good to me. +1 to start a vote. >> > >> > > >> > >> > > I think we don't need "inputIndex" in >> > NestedFieldReferenceExpression. >> > >> > > Actually, I think it is also not needed in >> FieldReferenceExpression, >> > >> > > and we should try to remove it (another topic). The RexInputRef >> in >> > >> > Calcite >> > >> > > also doesn't require an inputIndex because the field index should >> > >> > represent >> > >> > > index of the field in the underlying row type. Field references >> > >> shouldn't >> > >> > > be >> > >> > > aware of the number of inputs. >> > >> > > >> > >> > > Best, >> > >> > > Jark >> > >> > > >> > >> > > >> > >> > > On Tue, 29 Aug 2023 at 02:24, Venkatakrishnan Sowrirajan < >> > >> > vsowr...@asu.edu >> > >> > > > >> > >> > > wrote: >> > >> > > >> > >> > > > Hi Jinsong, >> > >> > > > >> > >> > > > Thanks for your comments. >> > >> > > > >> > >> > > > What is inputIndex in NestedFieldReferenceExpression? >> > >> > > > >> > >> > > > >> > >> > > > I haven't looked at it before. Do you mean, given that it is >> now >> > >> only >> > >> > > used >> > >> > > > to push filters it won't be subsequently used in further >> > >> > > > planning/optimization and therefore it is not required at this >> > time? >> > >> > > > >> > >> > > > So if NestedFieldReferenceExpression doesn't need inputIndex, >> is >> > >> there >> > >> > > > > a need to introduce a base class `ReferenceExpression`? >> > >> > > > >> > >> > > > For SupportsFilterPushDown itself, *ReferenceExpression* base >> > class >> > >> is >> > >> > > not >> > >> > > > needed. But there were discussions around cleaning up and >> > >> standardizing >> > >> > > the >> > >> > > > API for Supports*PushDown. SupportsProjectionPushDown currently >> > >> pushes >> > >> > > the >> > >> > > > projects as a 2-d array, instead it would be better to use the >> > >> standard >> > >> > > API >> > >> > > > which seems to be the *ResolvedExpression*. For >> > >> > > SupportsProjectionPushDown >> > >> > > > either FieldReferenceExpression (top level fields) or >> > >> > > > NestedFieldReferenceExpression (nested fields) is enough, in >> order >> > >> to >> > >> > > > provide a single API that handles both top level and nested >> > fields, >> > >> > > > ReferenceExpression will be introduced as a base class. >> > >> > > > >> > >> > > > Eventually, *SupportsProjectionPushDown#applyProjections* would >> > >> evolve >> > >> > as >> > >> > > > applyProjection(List<ReferenceExpression> projectedFields) and >> > >> nested >> > >> > > > fields would be pushed only if *supportsNestedProjections* >> returns >> > >> > true. >> > >> > > > >> > >> > > > Regards >> > >> > > > Venkata krishnan >> > >> > > > >> > >> > > > >> > >> > > > On Sun, Aug 27, 2023 at 11:12 PM Jingsong Li < >> > >> jingsongl...@gmail.com> >> > >> > > > wrote: >> > >> > > > >> > >> > > > > So if NestedFieldReferenceExpression doesn't need >> inputIndex, is >> > >> > there >> > >> > > > > a need to introduce a base class `ReferenceExpression`? >> > >> > > > > >> > >> > > > > Best, >> > >> > > > > Jingsong >> > >> > > > > >> > >> > > > > On Mon, Aug 28, 2023 at 2:09 PM Jingsong Li < >> > >> jingsongl...@gmail.com> >> > >> > > > > wrote: >> > >> > > > > > >> > >> > > > > > Hi thanks all for your discussion. >> > >> > > > > > >> > >> > > > > > What is inputIndex in NestedFieldReferenceExpression? >> > >> > > > > > >> > >> > > > > > I know inputIndex has special usage in >> > FieldReferenceExpression, >> > >> > but >> > >> > > > > > it is only for Join operators, and it is only for SQL >> > >> optimization. >> > >> > > It >> > >> > > > > > looks like there is no requirement for Nested. >> > >> > > > > > >> > >> > > > > > Best, >> > >> > > > > > Jingsong >> > >> > > > > > >> > >> > > > > > On Mon, Aug 28, 2023 at 1:13 PM Venkatakrishnan Sowrirajan >> > >> > > > > > <vsowr...@asu.edu> wrote: >> > >> > > > > > > >> > >> > > > > > > Thanks for all the feedback and discussion everyone. >> Looks >> > >> like >> > >> > we >> > >> > > > have >> > >> > > > > > > reached a consensus here. >> > >> > > > > > > >> > >> > > > > > > Just to summarize: >> > >> > > > > > > >> > >> > > > > > > 1. Introduce a new *ReferenceExpression* (or >> > >> > > > *BaseReferenceExpression*) >> > >> > > > > > > abstract class which will be extended by both >> > >> > > > > *FieldReferenceExpression* >> > >> > > > > > > and *NestedFieldReferenceExpression* (to be introduced as >> > >> part of >> > >> > > > this >> > >> > > > > FLIP) >> > >> > > > > > > 2. No need of *supportsNestedFilters *check as the >> current >> > >> > > > > > > *SupportsFilterPushDown* should already ignore unknown >> > >> > expressions >> > >> > > ( >> > >> > > > > > > *NestedFieldReferenceExpression* for example) and return >> > them >> > >> as >> > >> > > > > > > *remainingFilters. >> > >> > > > > > > *Maybe this should be clarified explicitly in the >> Javadoc of >> > >> > > > > > > *SupportsFilterPushDown. >> > >> > > > > > > *I will file a separate JIRA to fix the documentation. >> > >> > > > > > > 3. Refactor *SupportsProjectionPushDown* to use >> > >> > > *ReferenceExpression >> > >> > > > > *instead >> > >> > > > > > > of existing 2-d arrays to consolidate and be consistent >> with >> > >> > other >> > >> > > > > > > Supports*PushDown APIs - *outside the scope of this FLIP* >> > >> > > > > > > 4. Similarly *SupportsAggregatePushDown* should also be >> > >> evolved >> > >> > > > > whenever >> > >> > > > > > > nested fields support is added to use the >> > >> *ReferenceExpression - >> > >> > > > > **outside >> > >> > > > > > > the scope of this FLIP* >> > >> > > > > > > >> > >> > > > > > > Does this sound good? Please let me know if I have missed >> > >> > anything >> > >> > > > > here. If >> > >> > > > > > > there are no concerns, I will start a vote tomorrow. I >> will >> > >> also >> > >> > > get >> > >> > > > > the >> > >> > > > > > > FLIP-356 wiki updated. Thanks everyone once again! >> > >> > > > > > > >> > >> > > > > > > Regards >> > >> > > > > > > Venkata krishnan >> > >> > > > > > > >> > >> > > > > > > >> > >> > > > > > > On Thu, Aug 24, 2023 at 8:19 PM Becket Qin < >> > >> becket....@gmail.com >> > >> > > >> > >> > > > > wrote: >> > >> > > > > > > >> > >> > > > > > > > Hi Jark, >> > >> > > > > > > > >> > >> > > > > > > > How about having a separate >> > NestedFieldReferenceExpression, >> > >> and >> > >> > > > > > > > > abstracting a common base class "ReferenceExpression" >> > for >> > >> > > > > > > > > NestedFieldReferenceExpression and >> > >> FieldReferenceExpression? >> > >> > > This >> > >> > > > > makes >> > >> > > > > > > > > unifying expressions in >> > >> > > > > > > > > >> > >> > > > > >> > >> > >> "SupportsProjectionPushdown#applyProjections(List<ReferenceExpression> >> > >> > > > > > > > > ...)" >> > >> > > > > > > > > possible. >> > >> > > > > > > > >> > >> > > > > > > > >> > >> > > > > > > > I'd be fine with this. It at least provides a >> consistent >> > API >> > >> > > style >> > >> > > > / >> > >> > > > > > > > formality. >> > >> > > > > > > > >> > >> > > > > > > > Re: Yunhong, >> > >> > > > > > > > >> > >> > > > > > > > 3. Finally, I think we need to look at the costs and >> > >> benefits >> > >> > of >> > >> > > > > unifying >> > >> > > > > > > > > the SupportsFilterPushDown and >> > SupportsProjectionPushDown >> > >> (or >> > >> > > > > others) >> > >> > > > > > > > from >> > >> > > > > > > > > the perspective of interface implementers. A stable >> API >> > >> can >> > >> > > > reduce >> > >> > > > > user >> > >> > > > > > > > > development and change costs, if the current API can >> > fully >> > >> > meet >> > >> > > > the >> > >> > > > > > > > > functional requirements at the framework level, I >> > personal >> > >> > > > suggest >> > >> > > > > > > > reducing >> > >> > > > > > > > > the impact on connector developers. >> > >> > > > > > > > > >> > >> > > > > > > > >> > >> > > > > > > > I agree that the cost and benefit should be measured. >> And >> > >> the >> > >> > > > > measurement >> > >> > > > > > > > should be in the long term instead of short term. That >> is >> > >> why >> > >> > we >> > >> > > > > always >> > >> > > > > > > > need to align on the ideal end state first. >> > >> > > > > > > > Meeting functionality requirements is the bare minimum >> bar >> > >> for >> > >> > an >> > >> > > > > API. >> > >> > > > > > > > Simplicity, intuitiveness, robustness and evolvability >> are >> > >> also >> > >> > > > > important. >> > >> > > > > > > > In addition, for projects with many APIs, such as >> Flink, a >> > >> > > > > consistent API >> > >> > > > > > > > style is also critical for the user adoption as well as >> > bug >> > >> > > > > avoidance. It >> > >> > > > > > > > is very helpful for the community to agree on some API >> > >> design >> > >> > > > > conventions / >> > >> > > > > > > > principles. >> > >> > > > > > > > For example, in this particular case, via our >> discussion, >> > >> > > hopefully >> > >> > > > > we sort >> > >> > > > > > > > of established the following API design conventions / >> > >> > principles >> > >> > > > for >> > >> > > > > all >> > >> > > > > > > > the Supports*PushDown interfaces. >> > >> > > > > > > > >> > >> > > > > > > > 1. By default, expressions should be used if applicable >> > >> instead >> > >> > > of >> > >> > > > > other >> > >> > > > > > > > representations. >> > >> > > > > > > > 2. In general, the pushdown method should not assume >> all >> > the >> > >> > > > > pushdowns will >> > >> > > > > > > > succeed. So the applyX() method should return a >> boolean or >> > >> > > List<X>, >> > >> > > > > to >> > >> > > > > > > > handle the cases that some of the pushdowns cannot be >> > >> fulfilled >> > >> > > by >> > >> > > > > the >> > >> > > > > > > > implementation. >> > >> > > > > > > > >> > >> > > > > > > > Establishing such conventions and principles demands >> > careful >> > >> > > > > thinking for >> > >> > > > > > > > the aspects I mentioned earlier in addition to the API >> > >> > > > > functionalities. >> > >> > > > > > > > This helps lower the bar of understanding, reduces the >> > >> chance >> > >> > of >> > >> > > > > having >> > >> > > > > > > > loose ends in the API, and will benefit all the >> > >> participants in >> > >> > > the >> > >> > > > > project >> > >> > > > > > > > over time. I think this is the right way to achieve >> real >> > API >> > >> > > > > stability. >> > >> > > > > > > > Otherwise, we may end up chasing our tails to find ways >> > not >> > >> to >> > >> > > > > change the >> > >> > > > > > > > existing non-ideal APIs. >> > >> > > > > > > > >> > >> > > > > > > > Thanks, >> > >> > > > > > > > >> > >> > > > > > > > Jiangjie (Becket) Qin >> > >> > > > > > > > >> > >> > > > > > > > On Fri, Aug 25, 2023 at 9:33 AM yh z < >> > >> zhengyunhon...@gmail.com >> > >> > > >> > >> > > > > wrote: >> > >> > > > > > > > >> > >> > > > > > > > > Hi, Venkat, >> > >> > > > > > > > > >> > >> > > > > > > > > Thanks for the FLIP, it sounds good to support nested >> > >> fields >> > >> > > > filter >> > >> > > > > > > > > pushdown. Based on the design of flip and the above >> > >> options, >> > >> > I >> > >> > > > > would like >> > >> > > > > > > > > to make a few suggestions: >> > >> > > > > > > > > >> > >> > > > > > > > > 1. At present, introducing >> > NestedFieldReferenceExpression >> > >> > > looks >> > >> > > > > like a >> > >> > > > > > > > > better solution, which can fully meet our >> requirements >> > >> while >> > >> > > > > reducing >> > >> > > > > > > > > modifications to base class >> FieldReferenceExpression. In >> > >> the >> > >> > > long >> > >> > > > > run, I >> > >> > > > > > > > > tend to abstract a basic class for >> > >> > > NestedFieldReferenceExpression >> > >> > > > > and >> > >> > > > > > > > > FieldReferenceExpression as u suggested. >> > >> > > > > > > > > >> > >> > > > > > > > > 2. Personally, I don't recommend introducing >> > >> > > > > *supportsNestedFilters() in >> > >> > > > > > > > > supportsFilterPushdown. We just need to better >> declare >> > the >> > >> > > return >> > >> > > > > value >> > >> > > > > > > > of >> > >> > > > > > > > > the method *applyFilters. >> > >> > > > > > > > > >> > >> > > > > > > > > 3. Finally, I think we need to look at the costs and >> > >> benefits >> > >> > > of >> > >> > > > > unifying >> > >> > > > > > > > > the SupportsFilterPushDown and >> > SupportsProjectionPushDown >> > >> (or >> > >> > > > > others) >> > >> > > > > > > > from >> > >> > > > > > > > > the perspective of interface implementers. A stable >> API >> > >> can >> > >> > > > reduce >> > >> > > > > user >> > >> > > > > > > > > development and change costs, if the current API can >> > fully >> > >> > meet >> > >> > > > the >> > >> > > > > > > > > functional requirements at the framework level, I >> > personal >> > >> > > > suggest >> > >> > > > > > > > reducing >> > >> > > > > > > > > the impact on connector developers. >> > >> > > > > > > > > >> > >> > > > > > > > > Regards, >> > >> > > > > > > > > Yunhong Zheng (Swuferhong) >> > >> > > > > > > > > >> > >> > > > > > > > > >> > >> > > > > > > > > Venkatakrishnan Sowrirajan <vsowr...@asu.edu> >> > >> 于2023年8月25日周五 >> > >> > > > > 01:25写道: >> > >> > > > > > > > > >> > >> > > > > > > > > > To keep it backwards compatible, introduce another >> API >> > >> > > > > *applyAggregates >> > >> > > > > > > > > > *with >> > >> > > > > > > > > > *List<ReferenceExpression> *when nested field >> support >> > is >> > >> > > added >> > >> > > > > and >> > >> > > > > > > > > > deprecate the current API. This will by default >> throw >> > an >> > >> > > > > exception. In >> > >> > > > > > > > > > flink planner, *applyAggregates *with nested fields >> > and >> > >> if >> > >> > it >> > >> > > > > throws >> > >> > > > > > > > > > exception then *applyAggregates* without nested >> > fields. >> > >> > > > > > > > > > >> > >> > > > > > > > > > Regards >> > >> > > > > > > > > > Venkata krishnan >> > >> > > > > > > > > > >> > >> > > > > > > > > > >> > >> > > > > > > > > > On Thu, Aug 24, 2023 at 10:13 AM Venkatakrishnan >> > >> > Sowrirajan < >> > >> > > > > > > > > > vsowr...@asu.edu> wrote: >> > >> > > > > > > > > > >> > >> > > > > > > > > > > Jark, >> > >> > > > > > > > > > > >> > >> > > > > > > > > > > How about having a separate >> > >> > NestedFieldReferenceExpression, >> > >> > > > and >> > >> > > > > > > > > > >> abstracting a common base class >> > "ReferenceExpression" >> > >> > for >> > >> > > > > > > > > > >> NestedFieldReferenceExpression and >> > >> > > FieldReferenceExpression? >> > >> > > > > This >> > >> > > > > > > > > makes >> > >> > > > > > > > > > >> unifying expressions in >> > >> > > > > > > > > > >> >> > >> > > > > > > > >> > >> > > > > >> > >> > >> "SupportsProjectionPushdown#applyProjections(List<ReferenceExpression> >> > >> > > > > > > > > > >> ...)" >> > >> > > > > > > > > > >> possible. >> > >> > > > > > > > > > > >> > >> > > > > > > > > > > This should be fine for >> *SupportsProjectionPushDown* >> > >> and >> > >> > > > > > > > > > > *SupportsFilterPushDown*. One concern in the >> case of >> > >> > > > > > > > > > > *SupportsAggregatePushDown* with nested fields >> > support >> > >> > (to >> > >> > > be >> > >> > > > > added >> > >> > > > > > > > in >> > >> > > > > > > > > > > the future), with this proposal, the API will >> become >> > >> > > > backwards >> > >> > > > > > > > > > incompatible >> > >> > > > > > > > > > > as the *args *for the aggregate function is >> > >> > > > > > > > > > *List<FieldReferenceExpression> >> > >> > > > > > > > > > > *that needs to change to >> > *List<ReferenceExpression>*. >> > >> > > > > > > > > > > >> > >> > > > > > > > > > > Regards >> > >> > > > > > > > > > > Venkata krishnan >> > >> > > > > > > > > > > >> > >> > > > > > > > > > > >> > >> > > > > > > > > > > On Thu, Aug 24, 2023 at 1:18 AM Jark Wu < >> > >> > imj...@gmail.com> >> > >> > > > > wrote: >> > >> > > > > > > > > > > >> > >> > > > > > > > > > >> Hi Becket, >> > >> > > > > > > > > > >> >> > >> > > > > > > > > > >> I think it is the second case, that a >> > >> > > > > FieldReferenceExpression is >> > >> > > > > > > > > > >> constructed >> > >> > > > > > > > > > >> by the framework and passed to the connector >> > >> (interfaces >> > >> > > > > listed by >> > >> > > > > > > > > > >> Venkata[1] >> > >> > > > > > > > > > >> and Catalog#listPartitionsByFilter). Besides, >> > >> > > understanding >> > >> > > > > the >> > >> > > > > > > > nested >> > >> > > > > > > > > > >> field >> > >> > > > > > > > > > >> is optional for users/connectors (just treat it >> as >> > an >> > >> > > > unknown >> > >> > > > > > > > > expression >> > >> > > > > > > > > > >> if >> > >> > > > > > > > > > >> the >> > >> > > > > > > > > > >> connector doesn't want to support it). >> > >> > > > > > > > > > >> >> > >> > > > > > > > > > >> If we extend FieldReferenceExpression, in the >> case >> > of >> > >> > > "where >> > >> > > > > > > > > col.nested >> > >> > > > > > > > > > > >> > >> > > > > > > > > > >> 10", >> > >> > > > > > > > > > >> for the connectors already supported >> filter/delete >> > >> > > pushdown, >> > >> > > > > they >> > >> > > > > > > > may >> > >> > > > > > > > > > >> wrongly >> > >> > > > > > > > > > >> pushdown "col > 10" instead of "nested > 10" >> > because >> > >> > they >> > >> > > > > still >> > >> > > > > > > > treat >> > >> > > > > > > > > > >> FieldReferenceExpression as a top-level column. >> > This >> > >> > > problem >> > >> > > > > can be >> > >> > > > > > > > > > >> resolved >> > >> > > > > > > > > > >> by introducing an additional >> > >> "supportedNestedPushdown" >> > >> > for >> > >> > > > > each >> > >> > > > > > > > > > interface, >> > >> > > > > > > > > > >> but that method is not elegant and is hard to >> > remove >> > >> in >> > >> > > the >> > >> > > > > future, >> > >> > > > > > > > > and >> > >> > > > > > > > > > >> this could >> > >> > > > > > > > > > >> be avoided if we have a separate >> > >> > > > > NestedFieldReferenceExpression. >> > >> > > > > > > > > > >> >> > >> > > > > > > > > > >> If we want to extend FieldReferenceExpression, >> we >> > >> have >> > >> > to >> > >> > > > add >> > >> > > > > > > > > > protections >> > >> > > > > > > > > > >> for every related API in one shot. Besides, >> > >> > > > > FieldReferenceExpression >> > >> > > > > > > > > is >> > >> > > > > > > > > > a >> > >> > > > > > > > > > >> fundamental class in the planner, we have to go >> > >> through >> > >> > > all >> > >> > > > > the code >> > >> > > > > > > > > > that >> > >> > > > > > > > > > >> is using it to make sure it properly handling >> it if >> > >> it >> > >> > is >> > >> > > a >> > >> > > > > nested >> > >> > > > > > > > > field >> > >> > > > > > > > > > >> which >> > >> > > > > > > > > > >> is a big effort for the community. >> > >> > > > > > > > > > >> >> > >> > > > > > > > > > >> If we were designing this API on day 1, I fully >> > >> support >> > >> > > > > merging them >> > >> > > > > > > > > in >> > >> > > > > > > > > > a >> > >> > > > > > > > > > >> FieldReferenceExpression. But in this case, I'm >> > >> thinking >> > >> > > > > about how >> > >> > > > > > > > to >> > >> > > > > > > > > > >> provide >> > >> > > > > > > > > > >> users with a smooth migration path, and allow >> the >> > >> > > community >> > >> > > > to >> > >> > > > > > > > > gradually >> > >> > > > > > > > > > >> put efforts into evolving the API, and not block >> > the >> > >> > > "Nested >> > >> > > > > Fields >> > >> > > > > > > > > > Filter >> > >> > > > > > > > > > >> Pushdown" >> > >> > > > > > > > > > >> requirement. >> > >> > > > > > > > > > >> >> > >> > > > > > > > > > >> How about having a separate >> > >> > > NestedFieldReferenceExpression, >> > >> > > > > and >> > >> > > > > > > > > > >> abstracting a common base class >> > "ReferenceExpression" >> > >> > for >> > >> > > > > > > > > > >> NestedFieldReferenceExpression and >> > >> > > FieldReferenceExpression? >> > >> > > > > This >> > >> > > > > > > > > makes >> > >> > > > > > > > > > >> unifying expressions in >> > >> > > > > > > > > > >> >> > >> > > > > > > > >> > >> > > > > >> > >> > >> "SupportsProjectionPushdown#applyProjections(List<ReferenceExpression> >> > >> > > > > > > > > > >> ...)" >> > >> > > > > > > > > > >> possible. >> > >> > > > > > > > > > >> >> > >> > > > > > > > > > >> Best, >> > >> > > > > > > > > > >> Jark >> > >> > > > > > > > > > >> >> > >> > > > > > > > > > >> On Thu, 24 Aug 2023 at 07:00, Venkatakrishnan >> > >> > Sowrirajan < >> > >> > > > > > > > > > >> vsowr...@asu.edu> >> > >> > > > > > > > > > >> wrote: >> > >> > > > > > > > > > >> >> > >> > > > > > > > > > >> > Becket and Jark, >> > >> > > > > > > > > > >> > >> > >> > > > > > > > > > >> > Deprecate all the other >> > >> > > > > > > > > > >> > > methods except tryApplyFilters() and >> > >> > > > > tryApplyProjections(). >> > >> > > > > > > > > > >> > >> > >> > > > > > > > > > >> > For *SupportsProjectionPushDown*, we still >> need a >> > >> > > > > > > > > > >> > *supportsNestedProjections* API on the table >> > >> source as >> > >> > > > some >> > >> > > > > of the >> > >> > > > > > > > > > table >> > >> > > > > > > > > > >> > sources might not be able to handle nested >> fields >> > >> and >> > >> > > > > therefore >> > >> > > > > > > > the >> > >> > > > > > > > > > >> Flink >> > >> > > > > > > > > > >> > planner should not push down the nested >> > >> projections or >> > >> > > > else >> > >> > > > > the >> > >> > > > > > > > > > >> > *applyProjection >> > >> > > > > > > > > > >> > *API has to be appropriately changed to return >> > >> > > > > > > > > > >> > *unconvertibleProjections *similar >> > >> > > > > > > > > > >> > to *SupportsFilterPushDown*. >> > >> > > > > > > > > > >> > >> > >> > > > > > > > > > >> > Or we have to introduce two different >> > >> > applyProjections() >> > >> > > > > > > > > > >> > > methods for FieldReferenceExpression / >> > >> > > > > > > > > > NestedFieldReferenceExpression >> > >> > > > > > > > > > >> > > respectively. >> > >> > > > > > > > > > >> > >> > >> > > > > > > > > > >> > Agree this is not preferred. Given that >> > >> > > > > *supportNestedProjections >> > >> > > > > > > > > > >> *cannot >> > >> > > > > > > > > > >> > be deprecated/removed based on the current API >> > >> form, >> > >> > > > > extending >> > >> > > > > > > > > > >> > *FieldReferenceExpression* to support nested >> > fields >> > >> > > should >> > >> > > > > be >> > >> > > > > > > > okay. >> > >> > > > > > > > > > >> > >> > >> > > > > > > > > > >> > Another alternative could be to change >> > >> > *applyProjections >> > >> > > > > *to take >> > >> > > > > > > > > > >> > List<ResolvedExpression> and on the connector >> > side >> > >> > they >> > >> > > > > choose to >> > >> > > > > > > > > > handle >> > >> > > > > > > > > > >> > *FieldReferenceExpression* and >> > >> > > > > *NestedFieldReferenceExpression *as >> > >> > > > > > > > > > >> > applicable and return the >> remainingProjections. >> > In >> > >> the >> > >> > > > case >> > >> > > > > of >> > >> > > > > > > > > nested >> > >> > > > > > > > > > >> field >> > >> > > > > > > > > > >> > projections not supported, it should return >> them >> > >> back >> > >> > > but >> > >> > > > > only >> > >> > > > > > > > > > >> projecting >> > >> > > > > > > > > > >> > the top level fields. IMO, this is also *not >> > >> > preferred*. >> > >> > > > > > > > > > >> > >> > >> > > > > > > > > > >> > *SupportsAggregatePushDown* >> > >> > > > > > > > > > >> > >> > >> > > > > > > > > > >> > *AggregateExpression *currently takes in a >> list >> > of >> > >> > > > > > > > > > >> > *FieldReferenceExpression* as args for the >> > >> aggregate >> > >> > > > > function, if >> > >> > > > > > > > in >> > >> > > > > > > > > > >> future >> > >> > > > > > > > > > >> > *SupportsAggregatePushDown* adds support for >> > >> aggregate >> > >> > > > > pushdown on >> > >> > > > > > > > > > >> nested >> > >> > > > > > > > > > >> > fields then the AggregateExpression API also >> has >> > to >> > >> > > change >> > >> > > > > if a >> > >> > > > > > > > new >> > >> > > > > > > > > > >> > NestedFieldReferenceExpression is introduced >> for >> > >> > nested >> > >> > > > > fields. >> > >> > > > > > > > > > >> > >> > >> > > > > > > > > > >> > If we add a >> > >> > > > > > > > > > >> > > flag for each new filter, >> > >> > > > > > > > > > >> > > the interface will be filled with lots of >> flags >> > >> > (e.g., >> > >> > > > > > > > > > >> supportsBetween, >> > >> > > > > > > > > > >> > > supportsIN) >> > >> > > > > > > > > > >> > >> > >> > > > > > > > > > >> > In an ideal situation, I completely agree with >> > you. >> > >> > But >> > >> > > in >> > >> > > > > the >> > >> > > > > > > > > current >> > >> > > > > > > > > > >> > state, *supportsNestedFilters* can act as a >> > bridge >> > >> to >> > >> > > > reach >> > >> > > > > the >> > >> > > > > > > > > > eventual >> > >> > > > > > > > > > >> > desired state which is to have a clean and >> > >> consistent >> > >> > > set >> > >> > > > > of APIs >> > >> > > > > > > > > > >> > throughout all Supports*PushDown. >> > >> > > > > > > > > > >> > >> > >> > > > > > > > > > >> > Also shared some thoughts on the end state API >> > >> > > > > > > > > > >> > < >> > >> > > > > > > > > > >> > >> > >> > > > > > > > > > >> >> > >> > > > > > > > > > >> > >> > > > > > > > > >> > >> > > > > > > > >> > >> > > > > >> > >> > > > >> > >> > > >> > >> > >> > >> >> > >> https://urldefense.com/v3/__https://docs.google.com/document/d/1stLRPKOcxlEv8eHblkrOh0Zf5PLM-h76WMhEINHOyPY/edit?usp=sharing__;!!IKRxdwAv5BmarQ!ZZ2nS1PYlXLnEGFcikS3NsYG7tMaV3wU_z7FmvihNwQBmoLZk2WmcpuRWszK0FFmsInh9A6cndkJrQ$ >> > >> > > > > > > > > > >> > > >> > >> > > > > > > > > > >> > with extension to the >> *FieldReferenceExpression* >> > to >> > >> > > > support >> > >> > > > > nested >> > >> > > > > > > > > > >> fields. >> > >> > > > > > > > > > >> > Please take a look. >> > >> > > > > > > > > > >> > >> > >> > > > > > > > > > >> > Regards >> > >> > > > > > > > > > >> > Venkata krishnan >> > >> > > > > > > > > > >> > >> > >> > > > > > > > > > >> > On Tue, Aug 22, 2023 at 5:02 PM Becket Qin < >> > >> > > > > becket....@gmail.com> >> > >> > > > > > > > > > >> wrote: >> > >> > > > > > > > > > >> > >> > >> > > > > > > > > > >> > > Hi Jark, >> > >> > > > > > > > > > >> > > >> > >> > > > > > > > > > >> > > Regarding the migration path, it would be >> > useful >> > >> to >> > >> > > > > scrutinize >> > >> > > > > > > > the >> > >> > > > > > > > > > use >> > >> > > > > > > > > > >> > case >> > >> > > > > > > > > > >> > > of FiledReferenceExpression and >> > >> ResolvedExpressions. >> > >> > > > > There are >> > >> > > > > > > > two >> > >> > > > > > > > > > >> kinds >> > >> > > > > > > > > > >> > of >> > >> > > > > > > > > > >> > > use cases: >> > >> > > > > > > > > > >> > > >> > >> > > > > > > > > > >> > > 1. A ResolvedExpression is constructed by >> the >> > >> user >> > >> > or >> > >> > > > > connector >> > >> > > > > > > > / >> > >> > > > > > > > > > >> plugin >> > >> > > > > > > > > > >> > > developers. >> > >> > > > > > > > > > >> > > 2. A ResolvedExpression is constructed by >> the >> > >> > > framework >> > >> > > > > and >> > >> > > > > > > > passed >> > >> > > > > > > > > > to >> > >> > > > > > > > > > >> > user >> > >> > > > > > > > > > >> > > or connector / plugin developers. >> > >> > > > > > > > > > >> > > >> > >> > > > > > > > > > >> > > For the first case, both of the approaches >> > >> provide >> > >> > the >> > >> > > > > same >> > >> > > > > > > > > > migration >> > >> > > > > > > > > > >> > > experience. >> > >> > > > > > > > > > >> > > >> > >> > > > > > > > > > >> > > For the second case, generally speaking, >> > >> introducing >> > >> > > > > > > > > > >> > > NestedFieldReferenceExpression and extending >> > >> > > > > > > > > > FieldReferenceExpression >> > >> > > > > > > > > > >> > would >> > >> > > > > > > > > > >> > > have the same impact for backwards >> > compatibility. >> > >> > > > > > > > > > >> SupportsFilterPushDown >> > >> > > > > > > > > > >> > is >> > >> > > > > > > > > > >> > > a special case here because understanding >> the >> > >> filter >> > >> > > > > expressions >> > >> > > > > > > > > is >> > >> > > > > > > > > > >> > > optional for the source implementation. In >> > other >> > >> use >> > >> > > > > cases, if >> > >> > > > > > > > > > >> > > understanding the reference to a nested >> field >> > is >> > >> a >> > >> > > must >> > >> > > > > have, >> > >> > > > > > > > the >> > >> > > > > > > > > > user >> > >> > > > > > > > > > >> > code >> > >> > > > > > > > > > >> > > has to be changed, regardless of which >> approach >> > >> we >> > >> > > take >> > >> > > > to >> > >> > > > > > > > support >> > >> > > > > > > > > > >> nested >> > >> > > > > > > > > > >> > > fields. >> > >> > > > > > > > > > >> > > >> > >> > > > > > > > > > >> > > Therefore, I think we have to check each >> public >> > >> API >> > >> > > > where >> > >> > > > > the >> > >> > > > > > > > > nested >> > >> > > > > > > > > > >> > field >> > >> > > > > > > > > > >> > > reference is exposed. If we have many public >> > APIs >> > >> > > where >> > >> > > > > > > > > > understanding >> > >> > > > > > > > > > >> > > nested fields is optional for the user / >> > plugin >> > >> / >> > >> > > > > connector >> > >> > > > > > > > > > >> developers, >> > >> > > > > > > > > > >> > > having a separate >> > NestedFieldReferenceExpression >> > >> > would >> > >> > > > > have a >> > >> > > > > > > > more >> > >> > > > > > > > > > >> smooth >> > >> > > > > > > > > > >> > > migration. Otherwise, there seems to be no >> > >> > difference >> > >> > > > > between >> > >> > > > > > > > the >> > >> > > > > > > > > > two >> > >> > > > > > > > > > >> > > approaches. >> > >> > > > > > > > > > >> > > >> > >> > > > > > > > > > >> > > Migration path aside, the main reason I >> prefer >> > >> > > extending >> > >> > > > > > > > > > >> > > FieldReferenceExpression over a new >> > >> > > > > > > > NestedFieldReferenceExpression >> > >> > > > > > > > > > is >> > >> > > > > > > > > > >> > > because this makes the >> > SupportsProjectionPushDown >> > >> > > > > interface >> > >> > > > > > > > > simpler. >> > >> > > > > > > > > > >> > > Otherwise, we have to treat it as a special >> > case >> > >> > that >> > >> > > > > does not >> > >> > > > > > > > > match >> > >> > > > > > > > > > >> the >> > >> > > > > > > > > > >> > > overall API style. Or we have to introduce >> two >> > >> > > different >> > >> > > > > > > > > > >> > applyProjections() >> > >> > > > > > > > > > >> > > methods for FieldReferenceExpression / >> > >> > > > > > > > > > NestedFieldReferenceExpression >> > >> > > > > > > > > > >> > > respectively. This issue further extends to >> > >> > > > > implementation in >> > >> > > > > > > > > > >> addition to >> > >> > > > > > > > > > >> > > public API. A single >> FieldReferenceExpression >> > >> might >> > >> > > help >> > >> > > > > > > > simplify >> > >> > > > > > > > > > the >> > >> > > > > > > > > > >> > > implementation code a little bit. For >> example, >> > >> in a >> > >> > > > > recursive >> > >> > > > > > > > > > >> processing >> > >> > > > > > > > > > >> > of >> > >> > > > > > > > > > >> > > a row with nested rows, we may not need to >> > switch >> > >> > > > between >> > >> > > > > > > > > > >> > > FieldReferenceExpression and >> > >> > > > > NestedFieldReferenceExpression >> > >> > > > > > > > > > depending >> > >> > > > > > > > > > >> on >> > >> > > > > > > > > > >> > > whether the record being processed is a top >> > level >> > >> > > record >> > >> > > > > or >> > >> > > > > > > > nested >> > >> > > > > > > > > > >> > record. >> > >> > > > > > > > > > >> > > >> > >> > > > > > > > > > >> > > Thanks, >> > >> > > > > > > > > > >> > > >> > >> > > > > > > > > > >> > > Jiangjie (Becket) Qin >> > >> > > > > > > > > > >> > > >> > >> > > > > > > > > > >> > > >> > >> > > > > > > > > > >> > > On Tue, Aug 22, 2023 at 11:43 PM Jark Wu < >> > >> > > > > imj...@gmail.com> >> > >> > > > > > > > > wrote: >> > >> > > > > > > > > > >> > > >> > >> > > > > > > > > > >> > > > Hi Becket, >> > >> > > > > > > > > > >> > > > >> > >> > > > > > > > > > >> > > > I totally agree we should try to have a >> > >> consistent >> > >> > > API >> > >> > > > > for a >> > >> > > > > > > > > final >> > >> > > > > > > > > > >> > state. >> > >> > > > > > > > > > >> > > > The only concern I have mentioned is the >> > >> "smooth" >> > >> > > > > migration >> > >> > > > > > > > > path. >> > >> > > > > > > > > > >> > > > The FiledReferenceExpression is widely >> used >> > in >> > >> > many >> > >> > > > > public >> > >> > > > > > > > APIs, >> > >> > > > > > > > > > >> > > > not only in the SupportsFilterPushDown. >> Yes, >> > we >> > >> > can >> > >> > > > > change >> > >> > > > > > > > every >> > >> > > > > > > > > > >> > > > methods in 2-steps, but is it good to >> change >> > >> API >> > >> > > back >> > >> > > > > and >> > >> > > > > > > > forth >> > >> > > > > > > > > > for >> > >> > > > > > > > > > >> > this? >> > >> > > > > > > > > > >> > > > Personally, I'm fine with a separate >> > >> > > > > > > > > > NestedFieldReferenceExpression >> > >> > > > > > > > > > >> > > class. >> > >> > > > > > > > > > >> > > > TBH, I prefer the separated way because it >> > >> makes >> > >> > the >> > >> > > > > reference >> > >> > > > > > > > > > >> > expression >> > >> > > > > > > > > > >> > > > more clear and concise. >> > >> > > > > > > > > > >> > > > >> > >> > > > > > > > > > >> > > > Best, >> > >> > > > > > > > > > >> > > > Jark >> > >> > > > > > > > > > >> > > > >> > >> > > > > > > > > > >> > > > >> > >> > > > > > > > > > >> > > > On Tue, 22 Aug 2023 at 16:53, Becket Qin < >> > >> > > > > > > > becket....@gmail.com> >> > >> > > > > > > > > > >> wrote: >> > >> > > > > > > > > > >> > > > >> > >> > > > > > > > > > >> > > > > Thanks for the reply, Jark. >> > >> > > > > > > > > > >> > > > > >> > >> > > > > > > > > > >> > > > > I think it will be helpful to understand >> > the >> > >> > final >> > >> > > > > state we >> > >> > > > > > > > > want >> > >> > > > > > > > > > >> to >> > >> > > > > > > > > > >> > > > > eventually achieve first, then we can >> > discuss >> > >> > the >> > >> > > > > steps >> > >> > > > > > > > > towards >> > >> > > > > > > > > > >> that >> > >> > > > > > > > > > >> > > > final >> > >> > > > > > > > > > >> > > > > state. >> > >> > > > > > > > > > >> > > > > >> > >> > > > > > > > > > >> > > > > It looks like there are two proposed end >> > >> states >> > >> > > now: >> > >> > > > > > > > > > >> > > > > >> > >> > > > > > > > > > >> > > > > 1. Have a separate >> > >> > NestedFieldReferenceExpression >> > >> > > > > class; >> > >> > > > > > > > keep >> > >> > > > > > > > > > >> > > > > SupportsFilterPushDown and >> > >> > > > SupportsProjectionPushDown >> > >> > > > > the >> > >> > > > > > > > > same. >> > >> > > > > > > > > > >> It is >> > >> > > > > > > > > > >> > > > just >> > >> > > > > > > > > > >> > > > > a one step change. >> > >> > > > > > > > > > >> > > > > - Regarding the >> > >> > supportsNestedFilterPushDown() >> > >> > > > > method, if >> > >> > > > > > > > > our >> > >> > > > > > > > > > >> > > contract >> > >> > > > > > > > > > >> > > > > with the connector developer today is >> "The >> > >> > > > > implementation >> > >> > > > > > > > > should >> > >> > > > > > > > > > >> > ignore >> > >> > > > > > > > > > >> > > > > unrecognized expressions by putting them >> > into >> > >> > the >> > >> > > > > remaining >> > >> > > > > > > > > > >> filters, >> > >> > > > > > > > > > >> > > > > instead of throwing exceptions". Then >> there >> > >> is >> > >> > no >> > >> > > > > need for >> > >> > > > > > > > > this >> > >> > > > > > > > > > >> > > method. I >> > >> > > > > > > > > > >> > > > > am not sure about the current contract. >> We >> > >> > should >> > >> > > > > probably >> > >> > > > > > > > > make >> > >> > > > > > > > > > it >> > >> > > > > > > > > > >> > > clear >> > >> > > > > > > > > > >> > > > in >> > >> > > > > > > > > > >> > > > > the interface Java doc. >> > >> > > > > > > > > > >> > > > > >> > >> > > > > > > > > > >> > > > > 2. Extend the existing >> > >> FiledReferenceExpression >> > >> > > > class >> > >> > > > > to >> > >> > > > > > > > > support >> > >> > > > > > > > > > >> > nested >> > >> > > > > > > > > > >> > > > > fields; SupportsFilterPushDown only has >> one >> > >> > method >> > >> > > > of >> > >> > > > > > > > > > >> > > > > applyFilters(List<ResolvedExpression>); >> > >> > > > > > > > > > SupportsProjectionPushDown >> > >> > > > > > > > > > >> > only >> > >> > > > > > > > > > >> > > > has >> > >> > > > > > > > > > >> > > > > one method of >> > >> > > > > > > > applyProjections(List<FieldReferenceExpression>, >> > >> > > > > > > > > > >> > > DataType). >> > >> > > > > > > > > > >> > > > > It could just be two steps if we are not >> > too >> > >> > > > obsessed >> > >> > > > > with >> > >> > > > > > > > the >> > >> > > > > > > > > > >> exact >> > >> > > > > > > > > > >> > > > names >> > >> > > > > > > > > > >> > > > > of "applyFilters" and >> "applyProjections". >> > >> More >> > >> > > > > specifically, >> > >> > > > > > > > > it >> > >> > > > > > > > > > >> takes >> > >> > > > > > > > > > >> > > two >> > >> > > > > > > > > > >> > > > > steps to achieve this final state: >> > >> > > > > > > > > > >> > > > > a. introduce a new method >> > >> > > > > > > > > > >> > tryApplyFilters(List<ResolvedExpression>) >> > >> > > > > > > > > > >> > > > to >> > >> > > > > > > > > > >> > > > > SupportsFilterPushDown, which may have >> > >> > > > > > > > > FiledReferenceExpression >> > >> > > > > > > > > > >> with >> > >> > > > > > > > > > >> > > > nested >> > >> > > > > > > > > > >> > > > > fields. The default implementation >> throws >> > an >> > >> > > > > exception. The >> > >> > > > > > > > > > >> runtime >> > >> > > > > > > > > > >> > > will >> > >> > > > > > > > > > >> > > > > first call tryApplyFilters() with nested >> > >> fields. >> > >> > > In >> > >> > > > > case of >> > >> > > > > > > > > > >> > exception, >> > >> > > > > > > > > > >> > > it >> > >> > > > > > > > > > >> > > > > calls the existing applyFilters() >> without >> > >> > > including >> > >> > > > > the >> > >> > > > > > > > nested >> > >> > > > > > > > > > >> > filters. >> > >> > > > > > > > > > >> > > > > Similarly, in >> SupportsProjectionPushDown, >> > >> > > introduce >> > >> > > > a >> > >> > > > > > > > > > >> > > > > >> > >> tryApplyProjections<List<NestedFieldReference> >> > >> > > > method >> > >> > > > > > > > > returning >> > >> > > > > > > > > > a >> > >> > > > > > > > > > >> > > Result. >> > >> > > > > > > > > > >> > > > > The Result also contains the accepted >> and >> > >> > > > unapplicable >> > >> > > > > > > > > > >> projections. >> > >> > > > > > > > > > >> > The >> > >> > > > > > > > > > >> > > > > default implementation also throws an >> > >> exception. >> > >> > > > > Deprecate >> > >> > > > > > > > all >> > >> > > > > > > > > > the >> > >> > > > > > > > > > >> > > other >> > >> > > > > > > > > > >> > > > > methods except tryApplyFilters() and >> > >> > > > > tryApplyProjections(). >> > >> > > > > > > > > > >> > > > > b. remove the deprecated methods in >> the >> > >> next >> > >> > > > major >> > >> > > > > > > > version >> > >> > > > > > > > > > >> bump. >> > >> > > > > > > > > > >> > > > > >> > >> > > > > > > > > > >> > > > > Now the question is putting the >> migration >> > >> steps >> > >> > > > > aside, which >> > >> > > > > > > > > end >> > >> > > > > > > > > > >> > state >> > >> > > > > > > > > > >> > > do >> > >> > > > > > > > > > >> > > > > we prefer? While the first end state is >> > >> > acceptable >> > >> > > > > for me, >> > >> > > > > > > > > > >> > personally, >> > >> > > > > > > > > > >> > > I >> > >> > > > > > > > > > >> > > > > prefer the latter if we are designing >> from >> > >> > > scratch. >> > >> > > > > It is >> > >> > > > > > > > > clean, >> > >> > > > > > > > > > >> > > > consistent >> > >> > > > > > > > > > >> > > > > and intuitive. Given the size of Flink, >> > >> keeping >> > >> > > APIs >> > >> > > > > in the >> > >> > > > > > > > > same >> > >> > > > > > > > > > >> > style >> > >> > > > > > > > > > >> > > > over >> > >> > > > > > > > > > >> > > > > time is important. The migration is also >> > not >> > >> > that >> > >> > > > > > > > complicated. >> > >> > > > > > > > > > >> > > > > >> > >> > > > > > > > > > >> > > > > Thanks, >> > >> > > > > > > > > > >> > > > > >> > >> > > > > > > > > > >> > > > > Jiangjie (Becket) Qin >> > >> > > > > > > > > > >> > > > > >> > >> > > > > > > > > > >> > > > > >> > >> > > > > > > > > > >> > > > > On Tue, Aug 22, 2023 at 2:23 PM Jark Wu >> < >> > >> > > > > imj...@gmail.com> >> > >> > > > > > > > > > wrote: >> > >> > > > > > > > > > >> > > > > >> > >> > > > > > > > > > >> > > > > > Hi Venkat, >> > >> > > > > > > > > > >> > > > > > >> > >> > > > > > > > > > >> > > > > > Thanks for the proposal. >> > >> > > > > > > > > > >> > > > > > >> > >> > > > > > > > > > >> > > > > > I have some minor comments about the >> > FLIP. >> > >> > > > > > > > > > >> > > > > > >> > >> > > > > > > > > > >> > > > > > 1. I think we don't need to >> > >> > > > > > > > > > >> > > > > > add >> > >> > > SupportsFilterPushDown#supportsNestedFilters() >> > >> > > > > method, >> > >> > > > > > > > > > >> > > > > > because connectors can skip nested >> > filters >> > >> by >> > >> > > > > putting them >> > >> > > > > > > > > in >> > >> > > > > > > > > > >> > > > > > Result#remainingFilters(). >> > >> > > > > > > > > > >> > > > > > And this is backward-compatible >> because >> > >> > unknown >> > >> > > > > > > > expressions >> > >> > > > > > > > > > were >> > >> > > > > > > > > > >> > > added >> > >> > > > > > > > > > >> > > > to >> > >> > > > > > > > > > >> > > > > > the remaining filters. >> > >> > > > > > > > > > >> > > > > > Planner should push predicate >> expressions >> > >> as >> > >> > > more >> > >> > > > as >> > >> > > > > > > > > possible. >> > >> > > > > > > > > > >> If >> > >> > > > > > > > > > >> > we >> > >> > > > > > > > > > >> > > > add >> > >> > > > > > > > > > >> > > > > a >> > >> > > > > > > > > > >> > > > > > flag for each new filter, >> > >> > > > > > > > > > >> > > > > > the interface will be filled with >> lots of >> > >> > flags >> > >> > > > > (e.g., >> > >> > > > > > > > > > >> > > supportsBetween, >> > >> > > > > > > > > > >> > > > > > supportsIN). >> > >> > > > > > > > > > >> > > > > > >> > >> > > > > > > > > > >> > > > > > 2. >> > >> > > NestedFieldReferenceExpression#nestedFieldName >> > >> > > > > should >> > >> > > > > > > > be >> > >> > > > > > > > > an >> > >> > > > > > > > > > >> > array >> > >> > > > > > > > > > >> > > of >> > >> > > > > > > > > > >> > > > > > field names? >> > >> > > > > > > > > > >> > > > > > Each string represents a field name >> part >> > of >> > >> > the >> > >> > > > > field >> > >> > > > > > > > path. >> > >> > > > > > > > > > Just >> > >> > > > > > > > > > >> > keep >> > >> > > > > > > > > > >> > > > > > aligning with `nestedFieldIndexArray`. >> > >> > > > > > > > > > >> > > > > > >> > >> > > > > > > > > > >> > > > > > 3. My concern about making >> > >> > > > FieldReferenceExpression >> > >> > > > > > > > support >> > >> > > > > > > > > > >> nested >> > >> > > > > > > > > > >> > > > fields >> > >> > > > > > > > > > >> > > > > > is the compatibility. >> > >> > > > > > > > > > >> > > > > > It is a public API and >> users/connectors >> > are >> > >> > > > already >> > >> > > > > using >> > >> > > > > > > > > it. >> > >> > > > > > > > > > >> > People >> > >> > > > > > > > > > >> > > > > > assumed it is a top-level column >> > >> > > > > > > > > > >> > > > > > reference, and applied logic on it. >> But >> > >> that's >> > >> > > not >> > >> > > > > true >> > >> > > > > > > > now >> > >> > > > > > > > > > and >> > >> > > > > > > > > > >> > this >> > >> > > > > > > > > > >> > > > may >> > >> > > > > > > > > > >> > > > > > lead to unexpected errors. >> > >> > > > > > > > > > >> > > > > > Having a separate >> > >> > NestedFieldReferenceExpression >> > >> > > > > sounds >> > >> > > > > > > > > safer >> > >> > > > > > > > > > to >> > >> > > > > > > > > > >> > me. >> > >> > > > > > > > > > >> > > > > Mixing >> > >> > > > > > > > > > >> > > > > > them in a class may >> > >> > > > > > > > > > >> > > > > > confuse users what's the meaning of >> > >> > > > getFieldName() >> > >> > > > > and >> > >> > > > > > > > > > >> > > > getFieldIndex(). >> > >> > > > > > > > > > >> > > > > > >> > >> > > > > > > > > > >> > > > > > >> > >> > > > > > > > > > >> > > > > > Regarding using >> > >> NestedFieldReferenceExpression >> > >> > > in >> > >> > > > > > > > > > >> > > > > > SupportsProjectionPushDown, do you >> > >> > > > > > > > > > >> > > > > > have any concerns @Timo Walther < >> > >> > > > twal...@apache.org> >> > >> > > > > ? >> > >> > > > > > > > > > >> > > > > > >> > >> > > > > > > > > > >> > > > > > Best, >> > >> > > > > > > > > > >> > > > > > Jark >> > >> > > > > > > > > > >> > > > > > >> > >> > > > > > > > > > >> > > > > > >> > >> > > > > > > > > > >> > > > > > >> > >> > > > > > > > > > >> > > > > > On Tue, 22 Aug 2023 at 05:55, >> > >> Venkatakrishnan >> > >> > > > > Sowrirajan < >> > >> > > > > > > > > > >> > > > > vsowr...@asu.edu >> > >> > > > > > > > > > >> > > > > > > >> > >> > > > > > > > > > >> > > > > > wrote: >> > >> > > > > > > > > > >> > > > > > >> > >> > > > > > > > > > >> > > > > > > Sounds like a great suggestion, >> Becket. >> > >> +1. >> > >> > > > Agree >> > >> > > > > with >> > >> > > > > > > > > > >> cleaning >> > >> > > > > > > > > > >> > up >> > >> > > > > > > > > > >> > > > the >> > >> > > > > > > > > > >> > > > > > APIs >> > >> > > > > > > > > > >> > > > > > > and making it consistent in all the >> > >> pushdown >> > >> > > > APIs. >> > >> > > > > > > > > > >> > > > > > > >> > >> > > > > > > > > > >> > > > > > > Your suggested approach seems fine >> to >> > me, >> > >> > > unless >> > >> > > > > anyone >> > >> > > > > > > > > else >> > >> > > > > > > > > > >> has >> > >> > > > > > > > > > >> > > any >> > >> > > > > > > > > > >> > > > > > other >> > >> > > > > > > > > > >> > > > > > > concerns. Just have couple of >> > clarifying >> > >> > > > > questions: >> > >> > > > > > > > > > >> > > > > > > >> > >> > > > > > > > > > >> > > > > > > 1. Do you think we should >> standardize >> > the >> > >> > APIs >> > >> > > > > across >> > >> > > > > > > > all >> > >> > > > > > > > > > the >> > >> > > > > > > > > > >> > > > pushdown >> > >> > > > > > > > > > >> > > > > > > supports like >> > SupportsPartitionPushdown, >> > >> > > > > > > > > > >> SupportsDynamicFiltering >> > >> > > > > > > > > > >> > > etc >> > >> > > > > > > > > > >> > > > > in >> > >> > > > > > > > > > >> > > > > > > the end state? >> > >> > > > > > > > > > >> > > > > > > >> > >> > > > > > > > > > >> > > > > > > The current proposal works if we do >> not >> > >> want >> > >> > > to >> > >> > > > > migrate >> > >> > > > > > > > > > >> > > > > > > > SupportsFilterPushdown to also use >> > >> > > > > > > > > > >> > NestedFieldReferenceExpression >> > >> > > > > > > > > > >> > > > in >> > >> > > > > > > > > > >> > > > > > the >> > >> > > > > > > > > > >> > > > > > > > long term. >> > >> > > > > > > > > > >> > > > > > > > >> > >> > > > > > > > > > >> > > > > > > Did you mean >> *FieldReferenceExpression* >> > >> > > instead >> > >> > > > of >> > >> > > > > > > > > > >> > > > > > > *NestedFieldReferenceExpression*? >> > >> > > > > > > > > > >> > > > > > > >> > >> > > > > > > > > > >> > > > > > > 2. Extend the >> FieldReferenceExpression >> > to >> > >> > > > support >> > >> > > > > nested >> > >> > > > > > > > > > >> fields. >> > >> > > > > > > > > > >> > > > > > > > - Change the index field type >> > from >> > >> int >> > >> > > to >> > >> > > > > int[]. >> > >> > > > > > > > > > >> > > > > > > >> > >> > > > > > > > > > >> > > > > > > - Add a new method int[] >> > >> > > > getFieldIndexArray(). >> > >> > > > > > > > > > >> > > > > > > > - Deprecate the int >> > getFieldIndex() >> > >> > > > method, >> > >> > > > > the >> > >> > > > > > > > code >> > >> > > > > > > > > > >> will >> > >> > > > > > > > > > >> > be >> > >> > > > > > > > > > >> > > > > > removed >> > >> > > > > > > > > > >> > > > > > > in >> > >> > > > > > > > > > >> > > > > > > > the next major version bump. >> > >> > > > > > > > > > >> > > > > > > >> > >> > > > > > > > > > >> > > > > > > I assume getFieldIndex would return >> > >> > > > > fieldIndexArray[0], >> > >> > > > > > > > > > right? >> > >> > > > > > > > > > >> > > > > > > >> > >> > > > > > > > > > >> > > > > > > Thanks >> > >> > > > > > > > > > >> > > > > > > Venkat >> > >> > > > > > > > > > >> > > > > > > >> > >> > > > > > > > > > >> > > > > > > On Fri, Aug 18, 2023 at 4:47 PM >> Becket >> > >> Qin < >> > >> > > > > > > > > > >> becket....@gmail.com >> > >> > > > > > > > > > >> > > >> > >> > > > > > > > > > >> > > > > wrote: >> > >> > > > > > > > > > >> > > > > > > >> > >> > > > > > > > > > >> > > > > > > > Thanks for the proposal, Venkata. >> > >> > > > > > > > > > >> > > > > > > > >> > >> > > > > > > > > > >> > > > > > > > The current proposal works if we >> do >> > not >> > >> > want >> > >> > > > to >> > >> > > > > > > > migrate >> > >> > > > > > > > > > >> > > > > > > > SupportsFilterPushdown to also use >> > >> > > > > > > > > > >> > NestedFieldReferenceExpression >> > >> > > > > > > > > > >> > > > in >> > >> > > > > > > > > > >> > > > > > the >> > >> > > > > > > > > > >> > > > > > > > long term. >> > >> > > > > > > > > > >> > > > > > > > >> > >> > > > > > > > > > >> > > > > > > Did you mean >> *FieldReferenceExpression* >> > >> > > instead >> > >> > > > of >> > >> > > > > > > > > > >> > > > > > > *NestedFieldReferenceExpression*? >> > >> > > > > > > > > > >> > > > > > > >> > >> > > > > > > > > > >> > > > > > > > >> > >> > > > > > > > > > >> > > > > > > > Otherwise, the alternative >> solution >> > >> > briefly >> > >> > > > > mentioned >> > >> > > > > > > > in >> > >> > > > > > > > > > the >> > >> > > > > > > > > > >> > > > rejected >> > >> > > > > > > > > > >> > > > > > > > alternatives would be the >> following: >> > >> > > > > > > > > > >> > > > > > > > Phase 1: >> > >> > > > > > > > > > >> > > > > > > > 1. Introduce a >> > supportsNestedFilters() >> > >> > > method >> > >> > > > > to the >> > >> > > > > > > > > > >> > > > > > > SupportsFilterPushdown >> > >> > > > > > > > > > >> > > > > > > > interface. (same as current >> > proposal). >> > >> > > > > > > > > > >> > > > > > > > 2. Extend the >> > FieldReferenceExpression >> > >> to >> > >> > > > > support >> > >> > > > > > > > nested >> > >> > > > > > > > > > >> > fields. >> > >> > > > > > > > > > >> > > > > > > > - Change the index field type >> > from >> > >> int >> > >> > > to >> > >> > > > > int[]. >> > >> > > > > > > > > > >> > > > > > > >> > >> > > > > > > > > > >> > > > > > > - Add a new method int[] >> > >> > > > getFieldIndexArray(). >> > >> > > > > > > > > > >> > > > > > > > - Deprecate the int >> > getFieldIndex() >> > >> > > > method, >> > >> > > > > the >> > >> > > > > > > > code >> > >> > > > > > > > > > >> will >> > >> > > > > > > > > > >> > be >> > >> > > > > > > > > > >> > > > > > removed >> > >> > > > > > > > > > >> > > > > > > in >> > >> > > > > > > > > > >> > > > > > > > the next major version bump. >> > >> > > > > > > > > > >> > > > > > > >> > >> > > > > > > > > > >> > > > > > > >> > >> > > > > > > > > > >> > > > > > > >> > >> > > > > > > > > > >> > > > > > > 3. In the SupportsProjectionPushDown >> > >> > interface >> > >> > > > > > > > > > >> > > > > > > > - add a new method >> > >> > > > > > > > > > >> > > > > >> > >> applyProjection(List<FieldReferenceExpression>, >> > >> > > > > > > > > > >> > > > > > > > DataType), with default >> > implementation >> > >> > > > invoking >> > >> > > > > > > > > > >> > > > > > applyProjection(int[][], >> > >> > > > > > > > > > >> > > > > > > > DataType) >> > >> > > > > > > > > > >> > > > > > > > - deprecate the current >> > >> > > > > applyProjection(int[][], >> > >> > > > > > > > > > >> DataType) >> > >> > > > > > > > > > >> > > > method >> > >> > > > > > > > > > >> > > > > > > > >> > >> > > > > > > > > > >> > > > > > > > Phase 2 (in the next major version >> > >> bump) >> > >> > > > > > > > > > >> > > > > > > > 1. remove the deprecated methods. >> > >> > > > > > > > > > >> > > > > > > > >> > >> > > > > > > > > > >> > > > > > > > Phase 3 (optional) >> > >> > > > > > > > > > >> > > > > > > > 1. deprecate and remove the >> > >> > > > > supportsNestedFilters() / >> > >> > > > > > > > > > >> > > > > > > > supportsNestedProjection() methods >> > from >> > >> > the >> > >> > > > > > > > > > >> > > SupportsFilterPushDown >> > >> > > > > > > > > > >> > > > / >> > >> > > > > > > > > > >> > > > > > > > SupportsProjectionPushDown >> > interfaces. >> > >> > > > > > > > > > >> > > > > > > > >> > >> > > > > > > > > > >> > > > > > > > Personally I prefer this >> alternative. >> > >> It >> > >> > > takes >> > >> > > > > longer >> > >> > > > > > > > to >> > >> > > > > > > > > > >> finish >> > >> > > > > > > > > > >> > > the >> > >> > > > > > > > > > >> > > > > > work, >> > >> > > > > > > > > > >> > > > > > > > but the API eventually becomes >> clean >> > >> and >> > >> > > > > consistent. >> > >> > > > > > > > > But I >> > >> > > > > > > > > > >> can >> > >> > > > > > > > > > >> > > live >> > >> > > > > > > > > > >> > > > > > with >> > >> > > > > > > > > > >> > > > > > > > the current proposal. >> > >> > > > > > > > > > >> > > > > > > > >> > >> > > > > > > > > > >> > > > > > > > Thanks, >> > >> > > > > > > > > > >> > > > > > > > >> > >> > > > > > > > > > >> > > > > > > > Jiangjie (Becket) Qin >> > >> > > > > > > > > > >> > > > > > > > >> > >> > > > > > > > > > >> > > > > > > > On Sat, Aug 19, 2023 at 12:09 AM >> > >> > > > Venkatakrishnan >> > >> > > > > > > > > > Sowrirajan >> > >> > > > > > > > > > >> < >> > >> > > > > > > > > > >> > > > > > > > vsowr...@asu.edu> wrote: >> > >> > > > > > > > > > >> > > > > > > > >> > >> > > > > > > > > > >> > > > > > > > > Gentle ping for >> reviews/feedback. >> > >> > > > > > > > > > >> > > > > > > > > >> > >> > > > > > > > > > >> > > > > > > > > On Tue, Aug 15, 2023, 5:37 PM >> > >> > > > Venkatakrishnan >> > >> > > > > > > > > > Sowrirajan < >> > >> > > > > > > > > > >> > > > > > > > vsowr...@asu.edu >> > >> > > > > > > > > > >> > > > > > > > > > >> > >> > > > > > > > > > >> > > > > > > > > wrote: >> > >> > > > > > > > > > >> > > > > > > > > >> > >> > > > > > > > > > >> > > > > > > > > > Hi All, >> > >> > > > > > > > > > >> > > > > > > > > > >> > >> > > > > > > > > > >> > > > > > > > > > I am opening this thread to >> > discuss >> > >> > > > > FLIP-356: >> > >> > > > > > > > > Support >> > >> > > > > > > > > > >> > Nested >> > >> > > > > > > > > > >> > > > > Fields >> > >> > > > > > > > > > >> > > > > > > > > > Filter Pushdown. The FLIP can >> be >> > >> found >> > >> > > at >> > >> > > > > > > > > > >> > > > > > > > > > >> > >> > > > > > > > > > >> > > > > > > > > >> > >> > > > > > > > > > >> > > > > > > > >> > >> > > > > > > > > > >> > > > > > > >> > >> > > > > > > > > > >> > > > > > >> > >> > > > > > > > > > >> > > > > >> > >> > > > > > > > > > >> > > > >> > >> > > > > > > > > > >> > > >> > >> > > > > > > > > > >> > >> > >> > > > > > > > > > >> >> > >> > > > > > > > > > >> > >> > > > > > > > > >> > >> > > > > > > > >> > >> > > > > >> > >> > > > >> > >> > > >> > >> > >> > >> >> > >> https://urldefense.com/v3/__https://cwiki.apache.org/confluence/display/FLINK/FLIP-356*3A*Support*Nested*Fields*Filter*Pushdown__;JSsrKysr!!IKRxdwAv5BmarQ!clxXJwshKpn559SAkQiieqgGe0ZduXCzUKCmYLtFIbQLmrmEEgdmuEIM8ZM1M3O_uGqOploU4ailqGpukAg$ >> > >> > > > > > > > > > >> > > > > > > > > > >> > >> > > > > > > > > > >> > > > > > > > > > This FLIP adds support for >> > pushing >> > >> > down >> > >> > > > > nested >> > >> > > > > > > > > fields >> > >> > > > > > > > > > >> > filters >> > >> > > > > > > > > > >> > > > to >> > >> > > > > > > > > > >> > > > > > the >> > >> > > > > > > > > > >> > > > > > > > > > underlying TableSource. In our >> > data >> > >> > > lake, >> > >> > > > > we find >> > >> > > > > > > > a >> > >> > > > > > > > > > lot >> > >> > > > > > > > > > >> of >> > >> > > > > > > > > > >> > > > > datasets >> > >> > > > > > > > > > >> > > > > > > > have >> > >> > > > > > > > > > >> > > > > > > > > > nested fields and also user >> > queries >> > >> > with >> > >> > > > > filters >> > >> > > > > > > > > > >> defined on >> > >> > > > > > > > > > >> > > the >> > >> > > > > > > > > > >> > > > > > > nested >> > >> > > > > > > > > > >> > > > > > > > > > fields. This would drastically >> > >> improve >> > >> > > the >> > >> > > > > > > > > performance >> > >> > > > > > > > > > >> for >> > >> > > > > > > > > > >> > > > those >> > >> > > > > > > > > > >> > > > > > sets >> > >> > > > > > > > > > >> > > > > > > > of >> > >> > > > > > > > > > >> > > > > > > > > > queries. >> > >> > > > > > > > > > >> > > > > > > > > > >> > >> > > > > > > > > > >> > > > > > > > > > Appreciate any comments or >> > feedback >> > >> > you >> > >> > > > may >> > >> > > > > have >> > >> > > > > > > > on >> > >> > > > > > > > > > this >> > >> > > > > > > > > > >> > > > > proposal. >> > >> > > > > > > > > > >> > > > > > > > > > >> > >> > > > > > > > > > >> > > > > > > > > > Regards >> > >> > > > > > > > > > >> > > > > > > > > > Venkata krishnan >> > >> > > > > > > > > > >> > > > > > > > > > >> > >> > > > > > > > > > >> > > > > > > > > >> > >> > > > > > > > > > >> > > > > > > > >> > >> > > > > > > > > > >> > > > > > > >> > >> > > > > > > > > > >> > > > > > >> > >> > > > > > > > > > >> > > > > >> > >> > > > > > > > > > >> > > > >> > >> > > > > > > > > > >> > > >> > >> > > > > > > > > > >> > >> > >> > > > > > > > > > >> >> > >> > > > > > > > > > > >> > >> > > > > > > > > > >> > >> > > > > > > > > >> > >> > > > > > > > >> > >> > > > > >> > >> > > > > >> > >> > > > >> > >> > > >> > >> > >> > >> >> > > >> > >> >