Hi Vladimir, I find it completely reasonable to propagate only supported traits and not all kinds of them.
Per the classic Volcano paper [1], "enforcers" (I suppose "implementation algorithms" as well) can ensure multiple physical properties and destroy some others, which is also inline with what you propose. In terms of implementation I don't know what this change might break but seems the correct thing to do. If I remember well trait propagation is under a boolean flag so just make sure that the change makes sense also when the property is disabled. Best, Stamatis [1] https://www.cse.iitb.ac.in/infolab/Data/Courses/CS632/Papers/Volcano-graefe.pdf On Tue, May 4, 2021 at 9:37 AM Vladimir Ozerov <[email protected]> wrote: > Hi Haisheng, > > My original problem was with how Enumerable propagates traits. Many > Enumerable rules copy traits from the child operator. This seems wrong > because, as you mentioned, Enumerable supports only collation. Propagation > of the unsupported traits may lead to CannotPlanException as in the example > above when having a plan with multiple conventions. > > Therefore, the proposal is to change Enumerable rules, so that they > propagate only collation, but not other traits. Does it make sense? > > Regards, > Vladimir. > > ср, 21 апр. 2021 г. в 04:31, Haisheng Yuan <[email protected]>: > > > Hi Vladimir, > > > > > There are two problems here. First, the project operator potentially > > > destroys any trait which depends on column order, such as distribution > or > > > collation. Therefore, EnumerableProject has an incorrect value of the > > > distribution trait. > > > > The enumerable convention is intended for in-memory, non-distributed > > environment. > > Therefore, we only consider 2 traits: collation and convention. Other > > traits are not > > guaranteed to work correctly. If you want it work with distribution, you > > have to create > > your own operators, rules, either by extending or overriding, in which > > case, you will need > > to remap distribution columns to get the correct distribution trait, just > > like how collation does. > > > > > Second, which distribution should I assign to the CustomToEnumerable > > node? > > > As I know that parent convention cannot handle the distribution > properly, > > > my natural thought is to set it to ANY. > > > > You can assume CustomToEnumerable to be an Enforcer operator, like Sort, > > Exchange. > > Sort only changes data collation, Exchange changes data distribution and > > collation, similarly > > CustomToEnumerable only change convention, but retains collation and > > distribution, I assume. > > But in practice, it should be decided by the operator inventor and the > > underlying physical > > implementation. > > > > Hope that answers your question. Feel free to ask if you have more > > questions. > > > > Thanks, > > Haisheng Yuan > > > > On 2021/03/27 08:43:15, Vladimir Ozerov <[email protected]> wrote: > > > Hi, > > > > > > Apache Calcite supports heterogeneous optimization when nodes may have > > > different conventions. The Enumerable rules propagate all traits from > > > inputs. We have doubts whether this is correct or not. > > > > > > Consider the following initial plan, which was created by Apache > Calcite > > > after sql-to-rel conversion and invocation of TranslatableTable.toRel. > > The > > > table is in the CUSTOM convention. In this convention, there is an > > > additional Distribution trait that tracks which attribute is used for > > > sharding. It could be either SHARDED or ANY. The latter is the default > > > distribution value which is used when the distribution is unknown. > > Suppose > > > that the table is distributed by the attribute $0. > > > LogicalProject [convention=NONE, distribution=ANY] > > > CustomTable [convention=CUSTOM, distribution=SHARDED($0)] > > > > > > Now suppose that we run VolcanoPlanner with two rules: > > EnumerableProjectRule > > > and converter rules that translate the CUSTOM node to ENUMERABLE node. > > > First, the EnumerableProjectRule is executed. This rule propagates > traits > > > from the input, replacing only convention. Notice, how it propagated > the > > > distribution trait. > > > EnumerableProject [convention=ENUMERABLE, distribution=SHARDED($0)] > > > CustomTable [convention=CUSTOM, distribution=SHARDED($0)] > > > > > > Next, the converter will be invoked, yielding the following final plan: > > > EnumerableProject [convention=ENUMERABLE, distribution=SHARDED($0)] > > > CustomToEnumerable [convention=ENUMERABLE, distribution=???] > > > CustomTable [convention=CUSTOM, distribution=SHARDED($0)] > > > > > > There are two problems here. First, the project operator potentially > > > destroys any trait which depends on column order, such as distribution > or > > > collation. Therefore, EnumerableProject has an incorrect value of the > > > distribution trait. > > > Second, which distribution should I assign to the CustomToEnumerable > > node? > > > As I know that parent convention cannot handle the distribution > properly, > > > my natural thought is to set it to ANY. However, at least in the > top-down > > > optimizer, this will lead to CannotPlanException, unless I declare that > > [ANY > > > satisfies SHARDED($0)], which is not the case: ANY is unknown > > distribution, > > > so all distribution satisfies ANY, but not vice versa. > > > > > > My question is - shouldn't we ensure that only the collation trait is > > > propagated from child nodes in Enumerable rules? For example, in the > > > EnumerableProjectRule instead of doing: > > > input.getTraitSet() > > > .replace(EnumerableConvention.INSTANCE) > > > .replace(<newCollation>) > > > > > > we may do: > > > RelOptCluster.traitSet(). > > > .replace(EnumerableConvention.INSTANCE) > > > .replace(<newCollation>) > > > > > > This would ensure that all other traits are set to the default value. > The > > > generalization of this idea is that every convention has a set of > > supported > > > traits. Every unsupported trait should be set to the default value. > > > > > > I would appreciate your feedback on the matter. > > > > > > Regards, > > > Vladimir. > > > > > >
