Hi Jinpeng, Thank you for the clarification. When I saw the code in question for the first time, my first thought was that it was perhaps designed for gradual migration. The main problem is that the current implementation discards parts of the plan *silently*, which might be difficult to spot. I only spotted the problem in my specific case because I had ~100 tests with complex queries. Otherwise, I would happily proceed with the new rule without knowing that I lost important parts of the search space.
That said, I think we can do the following: 1. Emit a warning if or even throw an exception if the transformation rule produced a physical node. This should be trivial to implement - add an expected node type to VolcanoRuleCall (e.g., "logical", "physical", "any"). The warning/exception should contain a proper fix suggestion - to override the VolcanoPlanner.isTransformationRule. 2. Alternatively - do a breaking change. Apache Calcite doesn't have a major release cadence. It is normal practice in many products to do breaking changes in minor releases. Even popular products like Mongo or DataStax do it regularly. We may inform the user in the first release and change to "rule instanceof TransformationRule" in the next release. Does it make sense? Regards, Vladimir. пт, 28 мая 2021 г. в 19:33, Jinpeng Wu <[email protected]>: > Hi, Vladimir. Good catch! There could be some improvements here. > > Actually, this problem was discovered early when the top-down rule driver > was designed. At that time, no rule was annotated as "TransformationRule". > Moreover, it is impossible to ask every calcite user who designed their own > rules to annotate the existing code. So the top-down rule driver was > designed so that it can: > 1. Work in chaos: even if there are no hints for rule types, it can still > work. Some opportunities may be lost, but NO failures, NO exceptions, and > NO worse than the original driver. People can migrate to the new driver > without concern. > 2. Be Improvable: Users can refactor their code, if they want, step by > step. As rule types become more and more accurate, the system achieves more > and more benefits > 3. Be easy to customize: the default implementation is designed for the > most common cases, so that most users can benefit from it without much > effort. But it is not possible to fulfill all requirements as different > systems could have very different patterns to define logical and > physical. That's why the isTransformationRule method is put in > VolcanoPlanner and marked as protected: overwriting it can be very simple. > > Moreover, losing some "derive" opportunities is not as serious as > imagination. As I mentioned in previous discussions, parents are in charge > of raising as many requirements as possible. During "derive", if specific > traits were not built by children, it means that no parents were requiring > that. And if parents finally require that traits in the latter > optimization, passThrough methods get called and new physical nodes are > generated and "derive" get called again. > I tested it on millions of queries, with or without correct rule types, in > my own product. The performance of group pruning varies a lot. But the > output plans are almost the same. Only one obvious exception was > discovered: the spool node. That's because spool nodes cannot "passThough" > parent traits (it could have multiple parents and current framework cannot > handle such a situation) while it can "derive" input traits. > > Of course, this conclusion may not apply to your product as we could have > quite different rule sets. I am just sharing some of my experiences. Maybe > the current implementation of "isTransformationRule" is not good enough. If > you have any better solutions, please share them. > > Thanks, > Jinpeng Wu > > On Fri, May 28, 2021 at 7:10 PM Vladimir Ozerov <[email protected]> > wrote: > > > Hi, > > > > I have an optimizer that uses top-down VolcanoPlanner and has a > > ConverterRule for every LogicalNode. I have a new requirement when one of > > the physical rules must emit several physical nodes instead of one. I > tried > > to convert a ConverterRule to a normal rule that emits physical nodes. > Even > > though the new rule has exactly the same pattern and logic as the > previous > > ConverterRule, plans changed. Analysis showed that this likely due to a > bug > > in the top-down optimizer as explained below. > > > > When optimizing a logical node, the top-down first schedules the > > transformation rules, and then implementation rules. The logic to check > > whether the rule is transformation rule or not is located in > > VolcanoPlanner.isTransformationRule [1]. The rule scheduling logic > ensures > > that a given rule is executed either as a transformation rule, or an > > implementation rule, but not both. See TopDowRuleQueue.popMatch. The > > top-down optimizer schedules tasks in a stack. So even though the > > transformation rules are scheduled before implementation rules, the > latter > > executed first. > > > > If an implementation rule produces a physical node, this node will be > > notified about input traits in the "derive" phase. In contrast, > > transformation rules produce logical nodes only, and this happens after > the > > derivation of the inputs is completed. Therefore, if the top-down > optimizer > > mistakenly treats an implementation rule as a transformation rule, > "derive" > > will not be called on the produced physical nodes, leading to incomplete > > search space exploration. > > > > It seems, that this is exactly what happens in the current > implementation. > > The VolcanoPlanner.isTransformationRule looks like this: > > > > 1: protected boolean isTransformationRule(VolcanoRuleCall match) { > > 2: if (match.getRule() instanceof SubstitutionRule) { > > 3: return true; > > 4: } > > 5: if (match.getRule() instanceof ConverterRule > > 6: && match.getRule().getOutTrait() == rootConvention) { > > 7: return false; > > 8: } > > 9: return match.getRule().getOperand().trait == Convention.NONE > > 10: || match.getRule().getOperand().trait == null; > > 11: } > > > > If the rule is a ConverterRule and it produces the node with the target > > convention, it is treated as an implementation rule (lines 5-6). But if > the > > rule is not a ConverterRule, the method tries to deduce the rule's type > > from the incoming convention (lines 9-10). In practice, implementation > > rules either do not care about the incoming trait or expect the NONE > trait. > > Therefore, it seems that currently, the top-down optimizer treats many > > implementation rules as physical rules, and as a result, cannot notify > > physical nodes produced from these rules about trait derivation. > > > > This explains why in my case everything was ok when all implementation > > rules were ConverterRules, and why I lost some optimal plans when the > rule > > was refactored to a non-converter variant. > > > > Do you agree that this a bug? If yes, shouldn't we refactor that code to > > just check whether the rule is an instance of TransformationRule? Since > > this is a breaking change, we may add a special flag that preserves the > old > > behavior by default but allows for the new behavior to overcome the > > aforementioned problem. > > > > Regards, > > Vladimir. > > > > [1] > > > > > https://github.com/apache/calcite/blob/calcite-1.26.0/core/src/main/java/org/apache/calcite/plan/volcano/VolcanoPlanner.java#L1398-L1408 > > >
