Re: RelFieldTrimmer not optimally trimming after filters under joins?

Julian Hyde Tue, 04 Mar 2025 16:41:01 -0800

I see. RelFieldTrimmer could insert a Project right after the Filter, but in 
most calling conventions that plan would probably be less efficient. The field 
will be removed next time there is a Project or Aggregate.


> On Mar 4, 2025, at 4:33 PM, Steven Phillips <ste...@dremio.com.invalid> wrote:
> 
> Julian,
> The input column $1 is needed for the filter condition on the node below
> the join, and not needed for anything else above that. The join tells
> the filter below that it doesn't need that field. But the filter itself
> does not the field. And filters don't have the ability to remove fields
> (i.e. the rowtype of a filter is always the same as its input), so it
> returns a rowtype and mapping to the join above it that includes the field
> that's not needed. Joins also don't have the ability to trim fields, so it
> returns a rowtype and mapping to the node above that includes the field. So
> no one is "wrongly" telling its input 'I needs all of your fields'.
> 
> Contrast the situation where there is a Project on top of the Filter. Join
> passes down to the Project that it doesn't need that column. Project passes
> down to the Filter that it doesn't need that column. Filter does need it,
> so it keeps, and returns a rowtype/mapping that includes the column.
> Project doesn't need it, so it drops that field from the project
> expression, and returns a rowtype/mapping that doesn't include the field.
> and so on.
> 
> On Tue, Mar 4, 2025 at 4:00 PM Julian Hyde <jhyde.apa...@gmail.com> wrote:
> 
>> I don’t think I understand this conversation. RelFieldTrimmer is intended
>> to be invoked on the whole tree. Each node, when invoking the trimmer on
>> its input (child), tells the trimmer which of the fields of that input it
>> actually uses. Now ‘which fields it actually uses’ is based on the fields
>> that its consumer (parent) said that it was using.
>> 
>> If fields are not being trimmed as expected, look for one node that is
>> wrongly telling its input ‘I need all of your fields’.
>> 
>> Julian
>> 
>> 
>>> On Mar 4, 2025, at 2:50 PM, Ian Bertolacci 
>>> <ian.bertola...@workday.com.invalid>
>> wrote:
>>> 
>>> I just hacked together an override where it will build a redundant
>> project on each side if necessary.
>>> That should eliminate any overhead of invoking any planners or rules.
>>> (For our needs, additional projects have not performance implications)
>>> -Ian
>>> 
>>> From: Ian Bertolacci <ian.bertola...@workday.com.INVALID>
>>> Reply-To: "dev@calcite.apache.org" <dev@calcite.apache.org>
>>> Date: Tuesday, March 4, 2025 at 14:25
>>> To: "dev@calcite.apache.org" <dev@calcite.apache.org>
>>> Subject: Re: RelFieldTrimmer not optimally trimming after filters under
>> joins?
>>> 
>>>> I think you could work around this by always inserting trivial projects
>> over every node in the tree before trimming, and then clean up with
>> ProjectRemoveRule. This is pretty much exactly what I was doing. Good to
>> know that I’m not wildly
>>> 
>>> 
>>>> I think you could work around this by always inserting trivial projects
>> over every node in the tree before trimming, and then clean up with
>> ProjectRemoveRule.
>>> 
>>> 
>>> 
>>> This is pretty much exactly what I was doing.
>>> 
>>> Good to know that I’m not wildly off-track
>>> 
>>> Thanks!
>>> 
>>> -Ian
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> From: Steven Phillips <ste...@dremio.com.INVALID>
>>> 
>>> Reply-To: "dev@calcite.apache.org" <dev@calcite.apache.org>
>>> 
>>> Date: Tuesday, March 4, 2025 at 13:55
>>> 
>>> To: "dev@calcite.apache.org" <dev@calcite.apache.org>
>>> 
>>> Subject: Re: RelFieldTrimmer not optimally trimming after filters under
>> joins?
>>> 
>>> 
>>> 
>>> In think this is a current limitation of FieldTrimmer. The Join and
>> Filter nodes can't drop columns (since they don't carry column selection
>> information), and the trimmer doesn't add Project nodes (currently). I have
>> worked around this limitation
>>> 
>>> 
>>> 
>>> 
>>> 
>>> In think this is a current limitation of FieldTrimmer. The Join and
>> Filter
>>> 
>>> 
>>> 
>>> nodes can't drop columns (since they don't carry column selection
>>> 
>>> 
>>> 
>>> information), and the trimmer doesn't add Project nodes (currently). I
>> have
>>> 
>>> 
>>> 
>>> worked around this limitation by using HepPlanner with various
>>> 
>>> 
>>> 
>>> ProjectTranspose rules.
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> I think you could work around this by always inserting trivial projects
>>> 
>>> 
>>> 
>>> over every node in the tree before trimming, and then clean up with
>>> 
>>> 
>>> 
>>> ProjectRemoveRule.
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> On Tue, Mar 4, 2025 at 1:33 PM Ian Bertolacci
>>> 
>>> 
>>> 
>>> <ian.bertola...@workday.com.invalid> wrote:
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>>> I’m looking at using RelFieldTrimmer, and I’m noticing that if a side
>> of a
>>> 
>>> 
>>> 
>>>> join has unnecessary fields after a filter, there is no trim-fields
>> project
>>> 
>>> 
>>> 
>>>> on that side to reduce the width of the row.
>>> 
>>> 
>>> 
>>>> Is this expected, or is there a configuration or pre-processing step
>> that
>>> 
>>> 
>>> 
>>>> I am missing?
>>> 
>>> 
>>> 
>>>> 
>>> 
>>> 
>>> 
>>>> For example, starting with this tree (these all look better in
>> monospace,
>>> 
>>> 
>>> 
>>>> hopefully the formatting comes through)
>>> 
>>> 
>>> 
>>>> 4:Project(C5633_14509=[$4], C5633_486=[$8])
>>> 
>>> 
>>> 
>>>> └── 3:Join(condition=[=($1, $6)], joinType=[inner])
>>> 
>>> 
>>> 
>>>> ....├── 1:Filter(condition=[<($2, 10)])
>>> 
>>> 
>>> 
>>>> ....│...└── 0:TableScan(table=[T902], Schema=[...6 fields...])
>>> 
>>> 
>>> 
>>>> ....└── 2:TableScan(table=[T895], Schema=[...64 fields...])
>>> 
>>> 
>>> 
>>>> 
>>> 
>>> 
>>> 
>>>> The result of RelFieldTrimmer is this:
>>> 
>>> 
>>> 
>>>> 9:Project(C5633_14509=[$2], C5633_486=[$4])
>>> 
>>> 
>>> 
>>>> └── 8:Join(condition=[=($0, $3)], joinType=[inner])
>>> 
>>> 
>>> 
>>>> ....├── 6:Filter(condition=[<($1, 10)])
>>> 
>>> 
>>> 
>>>> ....│...└── 5:Project(C5633_14505=[$1], C5633_14506=[$2],
>> C5633_14509=[$4])
>>> 
>>> 
>>> 
>>>> ....│.......└── 0:TableScan(table=[T902], Schema=[...6 fields...])
>>> 
>>> 
>>> 
>>>> ....└── 7:Project(ID=[$0], C5633_486=[$2])
>>> 
>>> 
>>> 
>>>> ........└── 2:TableScan(table=[T895], Schema=[...64 fields...])
>>> 
>>> 
>>> 
>>>> 
>>> 
>>> 
>>> 
>>>> Notice: $1 on the LHS of the node is not used *after* the filter so a
>>> 
>>> 
>>> 
>>>> projection of only the $0 and $2 fields would be reduce the width of the
>>> 
>>> 
>>> 
>>>> row before the join.
>>> 
>>> 
>>> 
>>>> 
>>> 
>>> 
>>> 
>>>> However, I can force the insertion of a projection which is simply the
>>> 
>>> 
>>> 
>>>> identity (ie, projecting all fields of the input row with now additions
>> or
>>> 
>>> 
>>> 
>>>> subtractions):
>>> 
>>> 
>>> 
>>>> 5:Project(C5633_14509=[$4], C5633_486=[$8])
>>> 
>>> 
>>> 
>>>> └── 4:Join(condition=[=($1, $6)], joinType=[inner])
>>> 
>>> 
>>> 
>>>> ....├── 2:Project(...Identity mapping, 6 fields...)
>>> 
>>> 
>>> 
>>>> ....│...└── 1:Filter(condition=[<($2, 10)])
>>> 
>>> 
>>> 
>>>> ....│.......└── 0:TableScan(table=[T902], Schema=[...6 fields...])
>>> 
>>> 
>>> 
>>>> ....└── 3:TableScan(table=[T895], Schema=[...64 fields...])
>>> 
>>> 
>>> 
>>>> 
>>> 
>>> 
>>> 
>>>> And the result is a projection wich only has the 2 fields necessary
>> after
>>> 
>>> 
>>> 
>>>> the filter.
>>> 
>>> 
>>> 
>>>> 11:Project(C5633_14509=[$1], C5633_486=[$3])
>>> 
>>> 
>>> 
>>>> └── 10:Join(condition=[=($0, $2)], joinType=[inner])
>>> 
>>> 
>>> 
>>>> ....├── 8:Project(C5633_14505=[$0], C5633_14509=[$2]) <- trimmed
>>> 
>>> 
>>> 
>>>> ....│...└── 7:Filter(condition=[<($1, 10)])
>>> 
>>> 
>>> 
>>>> ....│.......└── 6:Project(C5633_14505=[$1], C5633_14506=[$2],
>>> 
>>> 
>>> 
>>>> C5633_14509=[$4])
>>> 
>>> 
>>> 
>>>> ....│...........└── 0:TableScan(table=[T902], Schema=[...6 fields...])
>>> 
>>> 
>>> 
>>>> ....└── 9:Project(ID=[$0], C5633_486=[$2])
>>> 
>>> 
>>> 
>>>> ........└── 3:TableScan(table=[T895], Schema=[...64 fields...])
>>> 
>>> 
>>> 
>>>> 
>>> 
>>> 
>>> 
>>>> Thanks!
>>> 
>>> 
>>> 
>>>> -Ian
>>> 
>>> 
>>> 
>>>> 
>> 
>>

Re: RelFieldTrimmer not optimally trimming after filters under joins?

Reply via email to