Hi Haisheng,

Thank you for the advice. This is exactly how I designed distribution at
the moment (the approach 2 from my original email) - as a List<int[]>
instead of just int[]. My main concern was the increased complexity of the
trait propagation/derivation, as I have to manage these nested lists by
hand. Nevertheless, it works well. So I hoped that there are better
built-in approaches that I may use. If the answer is negative, I'll
continue using the original approach, when multiple alternatives managed
manually.

Regards,
Vladimir.

вт, 25 мая 2021 г. в 20:30, Haisheng Yuan <[email protected]>:

> Hi Vladimir,
>
> Glad to see you raised the question.
>
> Here is the advice:
> Do not use RelMultipleTrait/RelCompositeTrait, which is fundamentally
> flawed and has many bugs. It can't work properly no matter for top-down or
> bottom-up.
>
> Instead, we need to add equivalent keys bitmap as the property of physical
> trait like RelCollation, RelDistribution.
>
> For example:
> class RelDistributionImpl {
>   // list of distribution keys
>   private ImmutableIntList keys;
>
>    // list of equivalent bitset for each distribution key
>   private ImmutableList<ImmutableBitSet> equivBitSets;
> }
>
> In the trait satisfy and column remapping, we also need to take equivalent
> keys into consideration. Some of the work need to be done in Calcite core
> framework.
>
> Greenplum Orca optimizer has similar strategy:
>
> https://github.com/greenplum-db/gporca/blob/master/libgpopt/include/gpopt/base/CDistributionSpecHashed.h#L44
>
> Regards,
> Haisheng Yuan
>
> On 2021/05/25 15:37:32, Vladimir Ozerov <[email protected]> wrote:
> > Hi,
> >
> > Consider the distributed SQL engine that uses a distribution property to
> > model exchanges. Consider the following physical tree. To do the
> > distributed join, we co-locate tuples using the equijoin key. Now the
> Join
> > operator has two equivalent distributions - [a1] and [b1]. It is critical
> > to expose both distributions so that the top Aggregate can take advantage
> > of the co-location.
> >
> > Aggregate[group=b1]
> >   DistributedJoin[a.a1=b.b1]   // SHARDED[a1], SHARDED[b1]
> >     Input[a]                   // SHARDED[a1]
> >     Input[b]                   // SHARDED[b1]
> >
> > A similar example for the Project:
> > Aggregate[group=$1]
> >   Project[$0=a, $1=a] // SHARDED[$0], SHARDED[$1]
> >     Input             // SHARDED[a]
> >
> > The question is how to model this situation properly?
> >
> > First, it seems that RelMultipleTrait and RelCompositeTrait were designed
> > to handle this situation. However, I couldn't make them work with the
> > top-down optimizer. The reason is that when we register a RelNode with a
> > composite trait in MEMO, VolcanoPlanner flattens the composite trait into
> > the default trait value in RelSet.add -> RelTraitSet.simplify. That is,
> the
> > trait [SHARDED[a], SHARDED[b]] will be converted to [ANY] so that the
> > original traits could not be derived in the PhysicalNode.derive methods.
> >
> > Second, we may try to model multiple sharding keys in a single trait. But
> > this complicates the implementation of PhysicalNode.passThrough/derive
> > significantly.
> > SHARDED[a1, a2], SHARDED[b1, b2] -> SHARDED[[a1, a2], [b1, b2]]
> >
> > Third, we may expose multiple traits using metadata. RelMdDistribution
> > would not work, because it exposes only a single distribution. But a
> custom
> > handler may potentially fix that. However, it will not be integrated with
> > the top-down optimizer still, which makes the idea questionable.
> >
> > To summarize, it seems that currently there is no easy way to handle
> > composite traits with a top-down optimizer. I wonder whether someone from
> > the devlist already solved similar issues in Apache Calcite or other
> > optimizers. If so, what was the approach or best practices? Intuitively,
> it
> > seems that RelMultipleTrait/RelCompositeTrait approach might be the way
> to
> > go. But why do we replace the original composite trait set with the
> default
> > value in the RelTraitSet.simplify routine?
> >
> > Regards,
> > Vladimir.
> >
>

Reply via email to