[ 
https://issues.apache.org/jira/browse/CALCITE-4920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated CALCITE-4920:
------------------------------------
    Labels: pull-request-available  (was: )

> Introduce logical space pruning to TopDownRuleDriver
> ----------------------------------------------------
>
>                 Key: CALCITE-4920
>                 URL: https://issues.apache.org/jira/browse/CALCITE-4920
>             Project: Calcite
>          Issue Type: Improvement
>            Reporter: Jinpeng Wu
>            Assignee: Jinpeng Wu
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Last year, we submit a PR, introducing the TopDownRuleDriver. The rule driver 
> implements the top-down search strategy as suggested by the Cascades 
> frameworks[1] and provides a basic branch and bound pruning mechanism 
> according to the upper bound cost and lower bound cost as suggested by the 
> Columbia paper[2].
> However, the previous version of TopDownRuleDriver can only prune 
> implementation rules and enforcement rules, not transformation rules. The 
> reason is major about logical properties.
> In the classic volcano/cascades model, logical properties, such as output row 
> count, are properties that bind to an equivalent set and will never change 
> during optimization. The Columbia optimizer[2] highly depends on this 
> premise. However, calcite does not obey such rules. In calcite, logical 
> properties of a RelSubset are likely to change during optimization. Actually, 
> calcite is not the only optimizer engine that suffers. Orca's logical 
> properties of an equivalent set also change. And it cannot have logical 
> pruning, either.
> How does the logical properties problem prevent logical pruning? Take this 
> plan as an example: sink <- op1 <- op2 <- scan.
> By applying a transformation rule, op1 <- op2 is transformed to op3 <- op4. 
> So we get a new alternative plan, say sink <- op3 <- op4 <- scan, in which 
> op3 is in the same equivalent set as op1.
> After implementations and enforcements, the sub plan (op1 <- op2 <- scan) 
> gets fully optimized and yield a winner with cost C1.
> And now we are going to optimize op3. We know another plan in the same 
> equivalent set has a cost of C1. So we can use C1 as a cost limit while 
> optimizing op3. In the first step, we should build op3 into a physical plan, 
> say impl-op3, and compute its self-cost as SC3.
> Ideally, if SC3 is already greater than C1, then we can decide that op3 will 
> never be part of the best plan, thus the optimization of op4 can be skipped. 
> That's the basic though of group pruning in the Columbia optimizer[2].
> Here comes the problem: when we calculate the self-cost of impl-op3, we need 
> to leverage the metadata, like row count, of impl-op3, which will in turn ask 
> impl-op3's input to derive its own metadata. However, the equivalent set of 
> op4 is not yet fully explored and its row count may not be the final one. So 
> the self-cost of impl-op3 may be incorrect. If we just apply group pruning 
> according to such cost, op4 will lost its opportunities to explore, and also 
> the opportunities to become the best.
> To ensure correctness, we require that all descendants are fully explored 
> when calculating a node's cost. That's why our first version of 
> TopDownRuleDriver only prunes implementation rules and enforcement rules.
> In the passed one year, We tried some ways to solve the problem. For example, 
> we tried to make calcite's logical properties stable, as Xiening proposed. 
> But the proposal was rejected as the changes of metadata after 
> transformations are natural. We also tried to identify, by categories or 
> annotations, rules who will never change the logical properties and give up 
> the pruning for other rules. But we still failed because it introduced too 
> much complexity for rule designers.
> Those failures drive us to consider the problem from the very essence: if we 
> cannot make SC3 stable, what about we give up the usage of SC3 and leverage 
> other costs for pruning?
> Here is a simple description of the new though. After achieving C1, we 
> eagerly build op3 and op4, without further exploration on them. Because op4's 
> input, the scan, is fully optimized during the optimization of op1, we can 
> compute a stable cumulative cost of impl-op4. Let's denote it as C4. And if 
> we find that C4 is already greater than C1, then we know C4 will never be the 
> best node and some optimization steps could be skipped (to make it simple, 
> let impl-op4 be the only input of impl-op3):
> 1. The enforcement rules among impl-sink, impl-op3 and impl-op4, as well as 
> trait pass-though. These steps are not handle properly in previous version.
> 2. The traits derivation of impl-op4 and impl-op3.
> 3. The explorations of op3, if the substitution of explorations always use 
> op4 as input. This is the key of logical pruning. I will explain it in more 
> details later on.
> Note that, the exploration of op4 is not pruned as we don't know whether 
> op4's other alternatives would yield a lower cost. Moreover, the 
> implementation of op3 is not skipped as it is already applied. But the 
> implementation of other alternatives of op3 could be skipped if the 
> exploration is pruned.
> The new solution is a hybrid of top-down and bottom-up optimization. 
> Optimization requests with cost limits are passed down in a top-down manner 
> while cost propagation and pruning take place in a bottom-up manner. And it 
> ensures that when a logical node is explored, it should already been built, 
> if it could, and all its implementation's inputs are fully optimized. Besides 
> better space pruning, this design also brings some other benefits:
> 1. When deriving traits, knowledge about inputs are stable and concrete. 
> Trait derivation will only be executed once for each node (except for some 
> corner cases). And the OMAKASE mode is fully supported.
> 2. No more cost improvement propagation in subset. And hence no more 
> RelMetadataQuery cache invalidations.
> 3. Exploration rules and enforcement rules could use MetadataQuery without 
> considering of regression. It is ensured that the input metadata is complete 
> and stable.
> According to the new thought, we implements a new TopDownRuleDriver. Note 
> that the new version is very different from the previous one. So calcite 
> community may decide whether to and how to merge it.
> The assertion "if the substitution always use op4 as input" is easier to say 
> than to implement. For example, op3 can transform to op5 via exploration 
> rule, and op5<-op4 can transform to op6 <- op7. In this case, op6<-op7 does 
> not use op4 as input any more. So the transformation from op3 to op5 should 
> take place. How to identify them?
> An ideal solution may be we looking into the rule set, building the networks 
> of node transformations and determining that no rules will accept the pattern 
> op5<-op4 and than deciding that the transformation of op3 to op5 could be 
> skipped.
> However, calcite's interface does not allow the rule driver to know what 
> kinds of node a transformation may produce without invoking the rule. That 
> is, we cannot know in previous that the exploration of op3 will produce op5 
> or any other nodes. So we use a simpler but weaker solution, we only check 
> whether there are rules that match op4 as non-root operators 
> (RelOptRuleOperand's ordinalInRule is not 0). This solution is not perfect, 
> of course. There could be further improvements.
> One may doubt the correctness the assumption, that op5 always have op4 as its 
> input if there are no rules matching for xx<-op4. Exception do exists when a 
> rule discards inputs. Such as the FilterReduceExpression rule transform the 
> whole subtree into an empty Values when it finds the condition is always 
> false. Luckily, I only see substitution rules have such behaviors, and 
> substitute rules are always applied.
> Another consideration is although we cannot identify whether an exploration 
> will have subsequent exploration, experience tells us that it is quite rare, 
> especially when substitute rules are already performed. So it is sometimes 
> useful that we just give up such explorations for the consideration of 
> performance. So we provide an option, the aggressivePruning option, with 
> which turned on the pruning will consider no more subsequent effect of an 
> exploration. Users may turn on this feature if they want better performance. 
> Note that this option is default ON in the code, as we found little 
> regressions during testings.
> To make the pruning more efficient, we introduce another rule: when 
> optimizing a subset, the subset with the default distribution and default 
> collation will be optimized first. This rule does not introduce regressions 
> considering the fact that there should always be a converter from the default 
> subset to a concrete subset and the default subset always has a cost no 
> larger than the concrete subset (because all rels in the concrete subset 
> should belongs to the default subset). But it introduce lots of benefits:
> - When computes the lower bound of an exploration rules, we do not know which 
> physical properties will the future implementation requires. So the default 
> subset provides a lower bound cost.
> - Logical nodes could be completely ignored while optimizing other subsets as 
> they must have already been optimized when optimizing the default subset.
> - We can still use the root cost of a subtree (the SC3 in the example) to 
> prune enforcement rules because when applying enforcement rules, we know that 
> all inputs are fully optimized.
> Another interesting topic is introducing the parallel optimization like orca 
> does. The new driver introduces a TaskScheduler that manages tasks. Ideally, 
> it could records the dependencies between tasks and schedules leaf tasks in 
> parallel. However, current version does not implement such a parallel 
> scheduler because most data structure of VolcanoPlanner is thread safe. 
> Community members who are interested in the parallel optimization could help 
> improve it.
> [1] 
> https://www.csd.uoc.gr/~hy460/pdf/CascadesFrameworkForQueryOptimization.pdf
> [2] http://web.cecs.pdx.edu/~len/IDEAS01.pdf



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to