I see. RelFieldTrimmer could insert a Project right after the Filter, but in most calling conventions that plan would probably be less efficient. The field will be removed next time there is a Project or Aggregate.
> On Mar 4, 2025, at 4:33 PM, Steven Phillips <ste...@dremio.com.invalid> wrote: > > Julian, > The input column $1 is needed for the filter condition on the node below > the join, and not needed for anything else above that. The join tells > the filter below that it doesn't need that field. But the filter itself > does not the field. And filters don't have the ability to remove fields > (i.e. the rowtype of a filter is always the same as its input), so it > returns a rowtype and mapping to the join above it that includes the field > that's not needed. Joins also don't have the ability to trim fields, so it > returns a rowtype and mapping to the node above that includes the field. So > no one is "wrongly" telling its input 'I needs all of your fields'. > > Contrast the situation where there is a Project on top of the Filter. Join > passes down to the Project that it doesn't need that column. Project passes > down to the Filter that it doesn't need that column. Filter does need it, > so it keeps, and returns a rowtype/mapping that includes the column. > Project doesn't need it, so it drops that field from the project > expression, and returns a rowtype/mapping that doesn't include the field. > and so on. > > On Tue, Mar 4, 2025 at 4:00 PM Julian Hyde <jhyde.apa...@gmail.com> wrote: > >> I don’t think I understand this conversation. RelFieldTrimmer is intended >> to be invoked on the whole tree. Each node, when invoking the trimmer on >> its input (child), tells the trimmer which of the fields of that input it >> actually uses. Now ‘which fields it actually uses’ is based on the fields >> that its consumer (parent) said that it was using. >> >> If fields are not being trimmed as expected, look for one node that is >> wrongly telling its input ‘I need all of your fields’. >> >> Julian >> >> >>> On Mar 4, 2025, at 2:50 PM, Ian Bertolacci >>> <ian.bertola...@workday.com.invalid> >> wrote: >>> >>> I just hacked together an override where it will build a redundant >> project on each side if necessary. >>> That should eliminate any overhead of invoking any planners or rules. >>> (For our needs, additional projects have not performance implications) >>> -Ian >>> >>> From: Ian Bertolacci <ian.bertola...@workday.com.INVALID> >>> Reply-To: "dev@calcite.apache.org" <dev@calcite.apache.org> >>> Date: Tuesday, March 4, 2025 at 14:25 >>> To: "dev@calcite.apache.org" <dev@calcite.apache.org> >>> Subject: Re: RelFieldTrimmer not optimally trimming after filters under >> joins? >>> >>>> I think you could work around this by always inserting trivial projects >> over every node in the tree before trimming, and then clean up with >> ProjectRemoveRule. This is pretty much exactly what I was doing. Good to >> know that I’m not wildly >>> >>> >>>> I think you could work around this by always inserting trivial projects >> over every node in the tree before trimming, and then clean up with >> ProjectRemoveRule. >>> >>> >>> >>> This is pretty much exactly what I was doing. >>> >>> Good to know that I’m not wildly off-track >>> >>> Thanks! >>> >>> -Ian >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> From: Steven Phillips <ste...@dremio.com.INVALID> >>> >>> Reply-To: "dev@calcite.apache.org" <dev@calcite.apache.org> >>> >>> Date: Tuesday, March 4, 2025 at 13:55 >>> >>> To: "dev@calcite.apache.org" <dev@calcite.apache.org> >>> >>> Subject: Re: RelFieldTrimmer not optimally trimming after filters under >> joins? >>> >>> >>> >>> In think this is a current limitation of FieldTrimmer. The Join and >> Filter nodes can't drop columns (since they don't carry column selection >> information), and the trimmer doesn't add Project nodes (currently). I have >> worked around this limitation >>> >>> >>> >>> >>> >>> In think this is a current limitation of FieldTrimmer. The Join and >> Filter >>> >>> >>> >>> nodes can't drop columns (since they don't carry column selection >>> >>> >>> >>> information), and the trimmer doesn't add Project nodes (currently). I >> have >>> >>> >>> >>> worked around this limitation by using HepPlanner with various >>> >>> >>> >>> ProjectTranspose rules. >>> >>> >>> >>> >>> >>> >>> >>> I think you could work around this by always inserting trivial projects >>> >>> >>> >>> over every node in the tree before trimming, and then clean up with >>> >>> >>> >>> ProjectRemoveRule. >>> >>> >>> >>> >>> >>> >>> >>> On Tue, Mar 4, 2025 at 1:33 PM Ian Bertolacci >>> >>> >>> >>> <ian.bertola...@workday.com.invalid> wrote: >>> >>> >>> >>> >>> >>> >>> >>>> I’m looking at using RelFieldTrimmer, and I’m noticing that if a side >> of a >>> >>> >>> >>>> join has unnecessary fields after a filter, there is no trim-fields >> project >>> >>> >>> >>>> on that side to reduce the width of the row. >>> >>> >>> >>>> Is this expected, or is there a configuration or pre-processing step >> that >>> >>> >>> >>>> I am missing? >>> >>> >>> >>>> >>> >>> >>> >>>> For example, starting with this tree (these all look better in >> monospace, >>> >>> >>> >>>> hopefully the formatting comes through) >>> >>> >>> >>>> 4:Project(C5633_14509=[$4], C5633_486=[$8]) >>> >>> >>> >>>> └── 3:Join(condition=[=($1, $6)], joinType=[inner]) >>> >>> >>> >>>> ....├── 1:Filter(condition=[<($2, 10)]) >>> >>> >>> >>>> ....│...└── 0:TableScan(table=[T902], Schema=[...6 fields...]) >>> >>> >>> >>>> ....└── 2:TableScan(table=[T895], Schema=[...64 fields...]) >>> >>> >>> >>>> >>> >>> >>> >>>> The result of RelFieldTrimmer is this: >>> >>> >>> >>>> 9:Project(C5633_14509=[$2], C5633_486=[$4]) >>> >>> >>> >>>> └── 8:Join(condition=[=($0, $3)], joinType=[inner]) >>> >>> >>> >>>> ....├── 6:Filter(condition=[<($1, 10)]) >>> >>> >>> >>>> ....│...└── 5:Project(C5633_14505=[$1], C5633_14506=[$2], >> C5633_14509=[$4]) >>> >>> >>> >>>> ....│.......└── 0:TableScan(table=[T902], Schema=[...6 fields...]) >>> >>> >>> >>>> ....└── 7:Project(ID=[$0], C5633_486=[$2]) >>> >>> >>> >>>> ........└── 2:TableScan(table=[T895], Schema=[...64 fields...]) >>> >>> >>> >>>> >>> >>> >>> >>>> Notice: $1 on the LHS of the node is not used *after* the filter so a >>> >>> >>> >>>> projection of only the $0 and $2 fields would be reduce the width of the >>> >>> >>> >>>> row before the join. >>> >>> >>> >>>> >>> >>> >>> >>>> However, I can force the insertion of a projection which is simply the >>> >>> >>> >>>> identity (ie, projecting all fields of the input row with now additions >> or >>> >>> >>> >>>> subtractions): >>> >>> >>> >>>> 5:Project(C5633_14509=[$4], C5633_486=[$8]) >>> >>> >>> >>>> └── 4:Join(condition=[=($1, $6)], joinType=[inner]) >>> >>> >>> >>>> ....├── 2:Project(...Identity mapping, 6 fields...) >>> >>> >>> >>>> ....│...└── 1:Filter(condition=[<($2, 10)]) >>> >>> >>> >>>> ....│.......└── 0:TableScan(table=[T902], Schema=[...6 fields...]) >>> >>> >>> >>>> ....└── 3:TableScan(table=[T895], Schema=[...64 fields...]) >>> >>> >>> >>>> >>> >>> >>> >>>> And the result is a projection wich only has the 2 fields necessary >> after >>> >>> >>> >>>> the filter. >>> >>> >>> >>>> 11:Project(C5633_14509=[$1], C5633_486=[$3]) >>> >>> >>> >>>> └── 10:Join(condition=[=($0, $2)], joinType=[inner]) >>> >>> >>> >>>> ....├── 8:Project(C5633_14505=[$0], C5633_14509=[$2]) <- trimmed >>> >>> >>> >>>> ....│...└── 7:Filter(condition=[<($1, 10)]) >>> >>> >>> >>>> ....│.......└── 6:Project(C5633_14505=[$1], C5633_14506=[$2], >>> >>> >>> >>>> C5633_14509=[$4]) >>> >>> >>> >>>> ....│...........└── 0:TableScan(table=[T902], Schema=[...6 fields...]) >>> >>> >>> >>>> ....└── 9:Project(ID=[$0], C5633_486=[$2]) >>> >>> >>> >>>> ........└── 3:TableScan(table=[T895], Schema=[...64 fields...]) >>> >>> >>> >>>> >>> >>> >>> >>>> Thanks! >>> >>> >>> >>>> -Ian >>> >>> >>> >>>> >> >>