Re: Re: Re: Re: [DISCUSS] Proposal to add API to force rules matching specific rels

Vladimir Ozerov Sat, 14 Mar 2020 13:20:00 -0700

Hi Haisheng,

You are right, the behavior I showed is not the default one, I should
provide more context here. This is how we (Hazelcast) and at least Drill
use the engine to ensure that the produced plan is optimal, I gave an
example in [1].
In real distributed engines, we rely on physical properties heavily. Most
notably, distribution and collation. And the fundamental problem with the
VolcanoOptimizer, is that it cannot propagate traits in a controlled
manner. This, in turn, forces us to use AbstractConverters and implement
rules in ways, which appear strange to Calcite community :-). And this, in
turn, leads to excessive rule calls and failure to plan complex queries.


Let's consider the same tree again, but now assuming that this is not the
complete tree, but a subtree, and there are some parent operators. Let's
also assume that the ScanRule may produce two equivalent operators with
different physical properties: PhysicalScan and PhysicalIndexScan[a ASC].
It is important to consider both alternatives in parent operators. Now
let's consider two different ways to optimize that subtree.

1. Canonical Calcite way (default)
1.1 Perform initial rules ordering, parent rules fire first: [ProjectRule,
ScanRule]
1.2 Invoke ProjectRule, which produces physical project without any
physical traits
1.3 Invoke ScanRule, which produces, PhysicalScan and PhysicalIndexScan[a
ASC]
Since ProjectRule was not aware of different scan alternatives, it missed
collation, and resulting hypergraph looks like this:

-- PhysicalProject
---- [PhysicalScan, PhysicalIndexScan[a ASC]]

This plan is suboptimal, because of parent operators cannot take advantage
of collation.

2. Hazelast/Drill way:
2.1 Enable abstract converters
2.2 Rules are ordered in the same way as in example 1: [ProjectRule,
ScanRule]
2.3 ProjectRule fires, enumerates physical implementations of the input.
Since there are no physical inputs yet, it exits without any transformations
2.4 ScanRule fires and produces two physical scans
2.5 Abstract converters ensure that the ProjectRule is scheduled for
execution again because it's input RelSet has new nodes
2.6 ProjectRule fires again, now having physical inputs, and generates one
implementation per unique combination of physical input traits.

As a result, we have two graphs now:

Graph 1:
-- PhysicalProject
---- PhysicalScan

Graph 2:
-- PhysicalProject[a ASC]
---- PhysicalIndexScan[a ASC]

Notice how we propagated physical collation. Now parent operators may take
advantage of it, e.g. eliminate sorting, or perform streaming aggregation
instead of blocking hash aggregation.

This is the fundamental problem we have in Hazelcast: how to ensure the
complete exploration of the search space without excessive rule calls.

Very short summary:
1. The default behavior of VolcanoOptimizer cannot explore the physical
search space, so plans are not optimal
2. Abstract converters fix this if you follow a certain pattern in rule
implementations (see 2.3), but generate too many rule calls, so join
planning rules cannot be called together with other rules, which again lead
to not optimal plans (yet, better than with p.1)
3. "Trait pull-up" proposal may fix it. But I have a feeling that pulling
up possible trait combinations from a child node is indistinguishable from
child node exploration, so it may be not very efficient again
4. A brand new optimizer implementation with recursive top-down approach
may address all the concerns from p.1-p.3, but appears to be complex to
implement and may be incompatible with existing rules

Not an easy choice.

Regards,
Vladimir.

[1]
https://mail-archives.apache.org/mod_mbox/calcite-dev/201910.mbox/%3CCAJJmzpU9_75O48WeEKgHKg3fTMhK3eAMmHNOVvczj_uUTBxHkA%40mail.gmail.com%3E

сб, 14 мар. 2020 г. в 21:53, Haisheng Yuan <[email protected]>:

> I agree that there should be no global rule queue, we should it do it on
> per-operator basis, which is also how other major Cascades frameworks do.
>
> However, Calcite's VolcanoPlanner doesn't generate unnecessary rule calls
> as you described. The current process is:
> 1. global rule queue: ScanRule, ProjectRule
> 2. Call ScanRule, produce physical scan
> 3. Call ProjectRule, produce physical project.
>
> Even with global rule queue of reversed order ProjectRule, ScanRule, there
> are still 2 rule calls. In your step 2, ProjectRule doesn't produce
> physical node, which is incorrect. Any rule is, and should be independent
> with each other rule. If your rule relies on other operators or rules to be
> explored first, then you should think about it twice.
>
> - Haisheng
>
> ------------------------------------------------------------------
> 发件人：Vladimir Ozerov<[email protected]>
> 日 期：2020年03月15日 01:50:10
> 收件人：[email protected] ([email protected])<[email protected]
> >
> 主 题：Re: Re: Re: [DISCUSS] Proposal to add API to force rules matching
> specific rels
>
> Hi Roman,
>
> In my understanding, the proposed minor changes may only decrease the total
> number of rule invocations slightly, but all principal problems remain the
> same. In the top-down approach, you do not want to implement bottom logical
> nodes unless it is requested explicitly by a parent operator.
>
> It seems to me that the very first step to efficient optimizer could be a
> new rule invocation infrastructure. There should be *no global rule queue*
> at all. Instead, we may introduce the per-node rule queue. Then, the
> optimizer performs a recursive top-down optimization dive, invoking the
> rules for every operator. Consider the following simple tree:
> -- LogicalProject
> ---- LogicalScan
>
> Assuming that we have two implementation rules ProjectRule, ScanRule, and
> abstract converters enabled, VolcanoOptimizer will proceed as follows,
> generating one unnecessary rule call:
> 1. Define global rule call queue: ProjectRule, ScanRule
> 2. Call ProjectRule, no new nodes are produced
> 3. Call ScanRule, produce physical scans, reschedule ProjectRule
> 4. Call ProjectRule again, produce the physical project
>
> With the proposed approach, it will work differently:
> 1. Define per-operator queues:
> LogicalProject -> ProjectRule
> LogicalScan -> ScanRule
> 2. Call optimize(LogicalProject)
> 3. Invoke ProjectRule, which calls optimize(LogicalScan) on the input
> 4. Invoke ScanRule, produce physical scans, return control back to
> ProjectRule
> 5. Produce the physical project, finish optimization
>
> Now we have only 2 rule invocations as expected, and we reached the same
> result as in the proposed minor changes. But the crucial difference is that
> now we have well-defined control flow between operators: start at the top,
> delegate to children. With this infrastructure in place, we will be able to
> introduce more complex features, such as pruning, or partial exploration
> later on.
>
> But notice that this change will be incompatible with the current rules,
> i.e. they should be re-written for the new optimization algorithm (e.g. see
> step 3), which might be a blocker for the current Calcite users. So maybe
> it is better to think of a new optimizer, rather than fixing
> VolcanoOptimizer.
>
> Regards,
> Vladimir.
>
>
> вт, 14 янв. 2020 г. в 23:52, Haisheng Yuan <[email protected]>:
>
> > On the other hand, if we don't preprocess and normalize the rel
> expression
> > before going to valcano planner, still compute and keep
> logical/relational
> > properties, like cardinality, for each operator, how can we achieve group
> > seach space pruning? Given a physical group expression, its required
> > property and upper bound cost C_limit, we need to get its child group
> > cardinality, compute local cost C_local, so that we can calculate the
> > child group's upper bound cost and pass it down.
> >
> > I can't figure out how we can do group pruning without shared relational
> > properties.
> >
> > - Haisheng
> >
> > ------------------------------------------------------------------
> > 发件人：Haisheng Yuan<[email protected]>
> > 日 期：2020年01月14日 12:06:17
> > 收件人：[email protected]<[email protected]>
> > 主 题：Re: Re: [DISCUSS] Proposal to add API to force rules matching
> specific
> > rels
> >
> > The example is valid if Calcite doesn't do normalization or preprocessing
> > before going to VolcanoPlanner.
> > But many databases and big data systems will try to preprocess the
> > expression (push down predicates etc.) so that expressions in the same
> > group can share the logical properties, for most case if not all. You may
> > argue that it should be cost based, e.g. evaluating filter early can
> still
> > be bad. It is true, but how accurate will the statistics be, how accurate
> > will the cost model be?
> >
> > - Haisheng
> >
> > ------------------------------------------------------------------
> > 发件人：Julian Hyde<[email protected]>
> > 日 期：2020年01月13日 08:44:54
> > 收件人：[email protected]<[email protected]>
> > 主 题：Re: [DISCUSS] Proposal to add API to force rules matching specific
> rels
> >
> > > MEMO group (RelSet) represents logically equivalent expressions.
> > > All the expressions in one group should share the same logical
> > > properties, e.g. functional dependency, constraint properties etc.
> > > But they are not sharing it. Computation is done for each individual
> > operator.
> >
> > It's good, and correct, that we compute for each individual operator.
> >
> > Suppose that a RelSubset s contains RelNode r1 and we know that the
> > constraint x > 0 holds. Suppose that we also have r2 with constraint y
> > < 10, and we discover that r1 and r2 are equivalent and belong
> > together in s. Now we can safely say that both constraints (x > 0 and
> > y < 10) apply to both r1 and r2.
> >
> > Deducing additional constraints in this way is a big win. The effort
> > to compute constraints for each RelNode is well-spent.
> >
> > This kind of deduction applies to other logical properties (e.g.
> > unique keys) and it applies to RelSet as well as RelSubset.
> >
> > Julian
> >
> >
> > On Sun, Jan 12, 2020 at 10:10 AM Roman Kondakov
> > <[email protected]> wrote:
> > >
> > > @Haisheng
> > >
> > > > Calcite uses Project operator and all kinds of ProjectXXXTranposeRule
> > to prune unused columns.
> > >
> > > I also noticed that in most cases Project-related rules took
> significant
> > > part of the planning time. But I didn't explore this problem yet.
> > >
> > > > MEMO group (RelSet) represents logically equivalent expressions. All
> > the expressions in one group should share the same logical properties,
> e.g.
> > functional dependency, constraint properties etc. But they are not
> sharing
> > it. Computation is done for each individual operator.
> > >
> > > I thought the equivalence of logical properties within groups (RelSets)
> > > are implicit. For example, in RelSet#addInternal it is always verified
> > > that the new added node has the same type as other members of the set.
> > >
> > > Anyway I absolutely agree with you that problems with traits
> > > propagation, rules matching and other that you mentioned in the
> previous
> > > e-mails should be solved in the first place. We need first to make
> > > Volcano optimizer work right and only then make some improvements like
> > > search space pruning.
> > >
> > > I would love to join this work to improve Volcano planner. Looking
> > > forward for design doc.
> > >
> > >
> > > --
> > > Kind Regards
> > > Roman Kondakov
> > >
> > >
> > > On 11.01.2020 11:42, Haisheng Yuan wrote:
> > > > Currently, Calcite uses Project operator and all kinds of
> > ProjectXXXTranposeRule to prune unused columns. Every operator's output
> > columns use index to reference child operators' columns. If there is a
> > Project operator with child operator of a Filter, if we push project down
> > under Filter, we will have Project-Filter-Project-FilterInput. All the
> > newly generated relnodes will trigger rule matches. e.g. If we already
> did
> > ReduceExpressionRule on filter, but due to the new filter RexCall's input
> > ref index changed, we have to apply ReduceExpressionRule on the new
> filter
> > again, even there is nothing can be reduced. Similarly new operator
> > transpose/merge rule will be triggered. This can trigger a lot of rule
> > matches.
> > > >
> > > > MEMO group (RelSet) represents logically equivalent expressions. All
> > the expressions in one group should share the same logical properties,
> e.g.
> > functional dependency, constraint properties etc. But they are not
> sharing
> > it. Computation is done for each individual operator.
> > > >
> > > > Without resolving those issue, space pruning won't help much.
> > > >
> > > > There are a lot of room for improvement. Hope the community can join
> > the effort to make Calcite better.
> > > >
> > > > - Haisheng
> > > >
> > > > ------------------------------------------------------------------
> > > > 发件人：Roman Kondakov<[email protected]>
> > > > 日 期：2020年01月10日 19:39:51
> > > > 收件人：<[email protected]>
> > > > 主 题：Re: [DISCUSS] Proposal to add API to force rules matching
> specific
> > rels
> > > >
> > > > @Haisheng, could you please clarify what you mean by these points?
> > > >
> > > >> - the poor-design of column pruning,
> > > >> - lack of group properties etc.
> > > >
> > > > I guess I'm not aware of these problems.
> > > >
> >
> >
>
>

Re: Re: Re: Re: [DISCUSS] Proposal to add API to force rules matching specific rels

Reply via email to