Hi,
I have an optimizer that uses top-down VolcanoPlanner and has a
ConverterRule for every LogicalNode. I have a new requirement when one of
the physical rules must emit several physical nodes instead of one. I tried
to convert a ConverterRule to a normal rule that emits physical nodes. Even
though the new rule has exactly the same pattern and logic as the previous
ConverterRule, plans changed. Analysis showed that this likely due to a bug
in the top-down optimizer as explained below.
When optimizing a logical node, the top-down first schedules the
transformation rules, and then implementation rules. The logic to check
whether the rule is transformation rule or not is located in
VolcanoPlanner.isTransformationRule [1]. The rule scheduling logic ensures
that a given rule is executed either as a transformation rule, or an
implementation rule, but not both. See TopDowRuleQueue.popMatch. The
top-down optimizer schedules tasks in a stack. So even though the
transformation rules are scheduled before implementation rules, the latter
executed first.
If an implementation rule produces a physical node, this node will be
notified about input traits in the "derive" phase. In contrast,
transformation rules produce logical nodes only, and this happens after the
derivation of the inputs is completed. Therefore, if the top-down optimizer
mistakenly treats an implementation rule as a transformation rule, "derive"
will not be called on the produced physical nodes, leading to incomplete
search space exploration.
It seems, that this is exactly what happens in the current implementation.
The VolcanoPlanner.isTransformationRule looks like this:
1: protected boolean isTransformationRule(VolcanoRuleCall match) {
2: if (match.getRule() instanceof SubstitutionRule) {
3: return true;
4: }
5: if (match.getRule() instanceof ConverterRule
6: && match.getRule().getOutTrait() == rootConvention) {
7: return false;
8: }
9: return match.getRule().getOperand().trait == Convention.NONE
10: || match.getRule().getOperand().trait == null;
11: }
If the rule is a ConverterRule and it produces the node with the target
convention, it is treated as an implementation rule (lines 5-6). But if the
rule is not a ConverterRule, the method tries to deduce the rule's type
from the incoming convention (lines 9-10). In practice, implementation
rules either do not care about the incoming trait or expect the NONE trait.
Therefore, it seems that currently, the top-down optimizer treats many
implementation rules as physical rules, and as a result, cannot notify
physical nodes produced from these rules about trait derivation.
This explains why in my case everything was ok when all implementation
rules were ConverterRules, and why I lost some optimal plans when the rule
was refactored to a non-converter variant.
Do you agree that this a bug? If yes, shouldn't we refactor that code to
just check whether the rule is an instance of TransformationRule? Since
this is a breaking change, we may add a special flag that preserves the old
behavior by default but allows for the new behavior to overcome the
aforementioned problem.
Regards,
Vladimir.
[1]
https://github.com/apache/calcite/blob/calcite-1.26.0/core/src/main/java/org/apache/calcite/plan/volcano/VolcanoPlanner.java#L1398-L1408