Hi Haisheng, My original problem was with how Enumerable propagates traits. Many Enumerable rules copy traits from the child operator. This seems wrong because, as you mentioned, Enumerable supports only collation. Propagation of the unsupported traits may lead to CannotPlanException as in the example above when having a plan with multiple conventions.
Therefore, the proposal is to change Enumerable rules, so that they propagate only collation, but not other traits. Does it make sense? Regards, Vladimir. ср, 21 апр. 2021 г. в 04:31, Haisheng Yuan <[email protected]>: > Hi Vladimir, > > > There are two problems here. First, the project operator potentially > > destroys any trait which depends on column order, such as distribution or > > collation. Therefore, EnumerableProject has an incorrect value of the > > distribution trait. > > The enumerable convention is intended for in-memory, non-distributed > environment. > Therefore, we only consider 2 traits: collation and convention. Other > traits are not > guaranteed to work correctly. If you want it work with distribution, you > have to create > your own operators, rules, either by extending or overriding, in which > case, you will need > to remap distribution columns to get the correct distribution trait, just > like how collation does. > > > Second, which distribution should I assign to the CustomToEnumerable > node? > > As I know that parent convention cannot handle the distribution properly, > > my natural thought is to set it to ANY. > > You can assume CustomToEnumerable to be an Enforcer operator, like Sort, > Exchange. > Sort only changes data collation, Exchange changes data distribution and > collation, similarly > CustomToEnumerable only change convention, but retains collation and > distribution, I assume. > But in practice, it should be decided by the operator inventor and the > underlying physical > implementation. > > Hope that answers your question. Feel free to ask if you have more > questions. > > Thanks, > Haisheng Yuan > > On 2021/03/27 08:43:15, Vladimir Ozerov <[email protected]> wrote: > > Hi, > > > > Apache Calcite supports heterogeneous optimization when nodes may have > > different conventions. The Enumerable rules propagate all traits from > > inputs. We have doubts whether this is correct or not. > > > > Consider the following initial plan, which was created by Apache Calcite > > after sql-to-rel conversion and invocation of TranslatableTable.toRel. > The > > table is in the CUSTOM convention. In this convention, there is an > > additional Distribution trait that tracks which attribute is used for > > sharding. It could be either SHARDED or ANY. The latter is the default > > distribution value which is used when the distribution is unknown. > Suppose > > that the table is distributed by the attribute $0. > > LogicalProject [convention=NONE, distribution=ANY] > > CustomTable [convention=CUSTOM, distribution=SHARDED($0)] > > > > Now suppose that we run VolcanoPlanner with two rules: > EnumerableProjectRule > > and converter rules that translate the CUSTOM node to ENUMERABLE node. > > First, the EnumerableProjectRule is executed. This rule propagates traits > > from the input, replacing only convention. Notice, how it propagated the > > distribution trait. > > EnumerableProject [convention=ENUMERABLE, distribution=SHARDED($0)] > > CustomTable [convention=CUSTOM, distribution=SHARDED($0)] > > > > Next, the converter will be invoked, yielding the following final plan: > > EnumerableProject [convention=ENUMERABLE, distribution=SHARDED($0)] > > CustomToEnumerable [convention=ENUMERABLE, distribution=???] > > CustomTable [convention=CUSTOM, distribution=SHARDED($0)] > > > > There are two problems here. First, the project operator potentially > > destroys any trait which depends on column order, such as distribution or > > collation. Therefore, EnumerableProject has an incorrect value of the > > distribution trait. > > Second, which distribution should I assign to the CustomToEnumerable > node? > > As I know that parent convention cannot handle the distribution properly, > > my natural thought is to set it to ANY. However, at least in the top-down > > optimizer, this will lead to CannotPlanException, unless I declare that > [ANY > > satisfies SHARDED($0)], which is not the case: ANY is unknown > distribution, > > so all distribution satisfies ANY, but not vice versa. > > > > My question is - shouldn't we ensure that only the collation trait is > > propagated from child nodes in Enumerable rules? For example, in the > > EnumerableProjectRule instead of doing: > > input.getTraitSet() > > .replace(EnumerableConvention.INSTANCE) > > .replace(<newCollation>) > > > > we may do: > > RelOptCluster.traitSet(). > > .replace(EnumerableConvention.INSTANCE) > > .replace(<newCollation>) > > > > This would ensure that all other traits are set to the default value. The > > generalization of this idea is that every convention has a set of > supported > > traits. Every unsupported trait should be set to the default value. > > > > I would appreciate your feedback on the matter. > > > > Regards, > > Vladimir. > > >
