Hi Vladimir,

I find it completely reasonable to propagate only supported traits and not
all kinds of them.

Per the classic Volcano paper [1], "enforcers" (I suppose "implementation
algorithms" as well) can ensure multiple physical properties and destroy
some others, which is also inline with what you propose.
In terms of implementation I don't know what this change might break but
seems the correct thing to do.
If I remember well trait propagation is under a boolean flag so just make
sure that the change makes sense also when the property is disabled.

Best,
Stamatis

[1]
https://www.cse.iitb.ac.in/infolab/Data/Courses/CS632/Papers/Volcano-graefe.pdf

On Tue, May 4, 2021 at 9:37 AM Vladimir Ozerov <[email protected]> wrote:

> Hi Haisheng,
>
> My original problem was with how Enumerable propagates traits. Many
> Enumerable rules copy traits from the child operator. This seems wrong
> because, as you mentioned, Enumerable supports only collation. Propagation
> of the unsupported traits may lead to CannotPlanException as in the example
> above when having a plan with multiple conventions.
>
> Therefore, the proposal is to change Enumerable rules, so that they
> propagate only collation, but not other traits. Does it make sense?
>
> Regards,
> Vladimir.
>
> ср, 21 апр. 2021 г. в 04:31, Haisheng Yuan <[email protected]>:
>
> > Hi Vladimir,
> >
> > > There are two problems here. First, the project operator potentially
> > > destroys any trait which depends on column order, such as distribution
> or
> > > collation. Therefore, EnumerableProject has an incorrect value of the
> > > distribution trait.
> >
> > The enumerable convention is intended for in-memory, non-distributed
> > environment.
> > Therefore, we only consider 2 traits: collation and convention. Other
> > traits are not
> > guaranteed to work correctly. If you want it work with distribution, you
> > have to create
> > your own operators, rules, either by extending or overriding, in which
> > case, you will need
> > to remap distribution columns to get the correct distribution trait, just
> > like how collation does.
> >
> > > Second, which distribution should I assign to the CustomToEnumerable
> > node?
> > > As I know that parent convention cannot handle the distribution
> properly,
> > > my natural thought is to set it to ANY.
> >
> > You can assume CustomToEnumerable to be an Enforcer operator, like Sort,
> > Exchange.
> > Sort only changes data collation, Exchange changes data distribution and
> > collation, similarly
> > CustomToEnumerable only change convention, but retains collation and
> > distribution, I assume.
> > But in practice, it should be decided by the operator inventor and the
> > underlying physical
> > implementation.
> >
> > Hope that answers your question. Feel free to ask if you have more
> > questions.
> >
> > Thanks,
> > Haisheng Yuan
> >
> > On 2021/03/27 08:43:15, Vladimir Ozerov <[email protected]> wrote:
> > > Hi,
> > >
> > > Apache Calcite supports heterogeneous optimization when nodes may have
> > > different conventions. The Enumerable rules propagate all traits from
> > > inputs. We have doubts whether this is correct or not.
> > >
> > > Consider the following initial plan, which was created by Apache
> Calcite
> > > after sql-to-rel conversion and invocation of TranslatableTable.toRel.
> > The
> > > table is in the CUSTOM convention. In this convention, there is an
> > > additional Distribution trait that tracks which attribute is used for
> > > sharding. It could be either SHARDED or ANY. The latter is the default
> > > distribution value which is used when the distribution is unknown.
> > Suppose
> > > that the table is distributed by the attribute $0.
> > > LogicalProject [convention=NONE,   distribution=ANY]
> > >   CustomTable  [convention=CUSTOM, distribution=SHARDED($0)]
> > >
> > > Now suppose that we run VolcanoPlanner with two rules:
> > EnumerableProjectRule
> > > and converter rules that translate the CUSTOM node to ENUMERABLE node.
> > > First, the EnumerableProjectRule is executed. This rule propagates
> traits
> > > from the input, replacing only convention. Notice, how it propagated
> the
> > > distribution trait.
> > > EnumerableProject [convention=ENUMERABLE, distribution=SHARDED($0)]
> > >   CustomTable     [convention=CUSTOM,     distribution=SHARDED($0)]
> > >
> > > Next, the converter will be invoked, yielding the following final plan:
> > > EnumerableProject    [convention=ENUMERABLE, distribution=SHARDED($0)]
> > >   CustomToEnumerable [convention=ENUMERABLE, distribution=???]
> > >     CustomTable      [convention=CUSTOM,     distribution=SHARDED($0)]
> > >
> > > There are two problems here. First, the project operator potentially
> > > destroys any trait which depends on column order, such as distribution
> or
> > > collation. Therefore, EnumerableProject has an incorrect value of the
> > > distribution trait.
> > > Second, which distribution should I assign to the CustomToEnumerable
> > node?
> > > As I know that parent convention cannot handle the distribution
> properly,
> > > my natural thought is to set it to ANY. However, at least in the
> top-down
> > > optimizer, this will lead to CannotPlanException, unless I declare that
> > [ANY
> > > satisfies SHARDED($0)], which is not the case: ANY is unknown
> > distribution,
> > > so all distribution satisfies ANY, but not vice versa.
> > >
> > > My question is - shouldn't we ensure that only the collation trait is
> > > propagated from child nodes in Enumerable rules? For example, in the
> > > EnumerableProjectRule instead of doing:
> > > input.getTraitSet()
> > >   .replace(EnumerableConvention.INSTANCE)
> > >   .replace(<newCollation>)
> > >
> > > we may do:
> > > RelOptCluster.traitSet().
> > >   .replace(EnumerableConvention.INSTANCE)
> > >   .replace(<newCollation>)
> > >
> > > This would ensure that all other traits are set to the default value.
> The
> > > generalization of this idea is that every convention has a set of
> > supported
> > > traits. Every unsupported trait should be set to the default value.
> > >
> > > I would appreciate your feedback on the matter.
> > >
> > > Regards,
> > > Vladimir.
> > >
> >
>

Reply via email to