Thanks for the pointer! It helps a lot. We hadn't thought to look up metadata in our own rules like this, but it makes sense.
On Wed, 4 Sept 2024 at 10:57, Julian Hyde <jhyde.apa...@gmail.com> wrote: > +1 using metadata > > In query planning there are only 3 places: rules, RelNodes, metadata. We > have tried putting ’state’ in rules and RelNodes, and it hasn’t worked out > too well. > > Metadata is superior to RelNodes for making decisions during planning. > Suppose you want to figure out whether to push down a filter ‘x > 0’. You > could stop when you see a Filter RelNode with x > 0 (or indeed x > 10). But > it’s better to look for any RelNode with a predicate ‘x > 0’. Metadata > propagates through RelNodes, and belongs to entire equivalence sets. You > don’t need to mutate a RelNode r to give it a property - you just add a > RelNode with that property to an equivalence set that r belongs to. > > Julian > > > > > On Sep 3, 2024, at 4:22 AM, Stamatis Zampetakis <zabe...@gmail.com> > wrote: > > > > Hey Victor, > > > > In general it is best to avoid stateful rules but in some cases it may > > be unavoidable. A common way to check state for deciding whether to > > apply a transformation (or not) is via the metadata mechanism > > (RelMetadataQuery). > > > > It seems that when the union is already subject to a limit then the > > new rule should not trigger. To achieve that one way would be to > > introduce a condition that relies on RelMetadataQuery#getMaxRowCount > > of the union (or its inputs) and bail out when the optimization is not > > gonna bring some notable benefit. > > > > ``` > > LogicalSort[fetch = 10] > > LogicalUnion[all=true] > > LogicalSort[fetch = 10] > > <Input 1> > > LogicalSort[fetch = 10] > > <Input 2> > > ``` > > In the plan above the max row count of the union should be 20 and > > pushing a limit 10 is not gonna change that so the rule should detect > > that (e.g., via getMaxRowCount) and not trigger again. > > > > If you start working with metadata classes you may find that some > > cases are not fully handled so don't hesitate to raise JIRAs/PRs to > > improve this area. > > > > Best, > > Stamatis > > > > On Thu, Aug 29, 2024 at 11:31 PM Victor Barua > > <victor.ba...@datadoghq.com.invalid> wrote: > >> > >> Hello! > >> > >> We've been attempting to implement a simple optimization rule in which > we > >> duplicate a limit through a UNION ALL to reduce the amount of data we > need > >> to fetch for a query. > >> > >> Starting from something like > >> ``` > >> LogicalSort[fetch = 10] > >> LogicalUnion[all=true] > >> <Input 1> > >> <Input 2> > >> ``` > >> > >> We're trying to turn it into > >> ``` > >> LogicalSort[fetch = 10] > >> LogicalUnion[all=true] > >> LogicalSort[fetch = 10] > >> <Input 1> > >> LogicalSort[fetch = 10] > >> <Input 2> > >> ``` > >> > >> This, somewhat expectedly, causes issues with the VolcanoPlanner because > >> the newly generated relation is also a candidate for our rule so we end > up > >> with an infinite planning loop. We tried to take inspiration from the > >> JoinCommuteRule, which uses the ensureRegistered method to prevent this > (or > >> at least that's what we think it's doing). Unfortunately, in our case > this > >> appears to be insufficient. I would appreciate any pointers and or > >> suggestions around this. I've included the code for the raw rule below. > >> > >> > >> ``` > >> > >> @Value.Enclosing > >> public class LimitThroughUnionRule extends > RelRule<LimitThroughUnionRule.Config> > >> implements SubstitutionRule { > >> > >> public static final LimitThroughUnionRule INSTANCE = > >> LimitThroughUnionRule.Config.DEFAULT.toRule(); > >> > >> protected LimitThroughUnionRule(Config config) { > >> super(config); > >> } > >> > >> private RelNode pushLimitThrough(RelBuilder relBuilder, Sort sort, > >> Union union) { > >> for (RelNode unionInput : union.getInputs()) { > >> relBuilder.push(unionInput).sortLimit(sort.offset, sort.fetch, > >> Collections.emptyList()); > >> } > >> relBuilder.union(true); > >> relBuilder.sortLimit(sort.offset, sort.fetch, sort.getSortExps()); > >> return relBuilder.build(); > >> } > >> > >> @Override > >> public void onMatch(RelOptRuleCall call) { > >> Sort sort = call.rel(0); > >> Union union = call.rel(1); > >> > >> RelNode newNode = pushLimitThrough(call.builder(), sort, union); > >> RelNode nextNode = > >> pushLimitThrough(call.builder(), (Sort) newNode, (Union) > >> newNode.getInput(0)); > >> > >> call.transformTo(newNode); > >> call.getPlanner().ensureRegistered(nextNode, newNode); > >> } > >> > >> @Value.Immutable > >> public interface Config extends RelRule.Config { > >> LimitThroughUnionRule.Config DEFAULT = > >> ImmutableLimitThroughUnionRule.Config.builder() > >> .operandSupplier( > >> b0 -> > >> b0.operand(Sort.class) > >> .predicate(sort -> !(RelOptUtil.isOrder(sort) > >> || RelOptUtil.isOffset(sort))) > >> .oneInput(u -> > u.operand(Union.class).anyInputs())) > >> .build(); > >> > >> @Override > >> default LimitThroughUnionRule toRule() { > >> return new LimitThroughUnionRule(this); > >> } > >> } > >> } > >> > >> ``` > >