Hi Haisheng, Thank you for the advice. This is exactly how I designed distribution at the moment (the approach 2 from my original email) - as a List<int[]> instead of just int[]. My main concern was the increased complexity of the trait propagation/derivation, as I have to manage these nested lists by hand. Nevertheless, it works well. So I hoped that there are better built-in approaches that I may use. If the answer is negative, I'll continue using the original approach, when multiple alternatives managed manually.
Regards, Vladimir. вт, 25 мая 2021 г. в 20:30, Haisheng Yuan <[email protected]>: > Hi Vladimir, > > Glad to see you raised the question. > > Here is the advice: > Do not use RelMultipleTrait/RelCompositeTrait, which is fundamentally > flawed and has many bugs. It can't work properly no matter for top-down or > bottom-up. > > Instead, we need to add equivalent keys bitmap as the property of physical > trait like RelCollation, RelDistribution. > > For example: > class RelDistributionImpl { > // list of distribution keys > private ImmutableIntList keys; > > // list of equivalent bitset for each distribution key > private ImmutableList<ImmutableBitSet> equivBitSets; > } > > In the trait satisfy and column remapping, we also need to take equivalent > keys into consideration. Some of the work need to be done in Calcite core > framework. > > Greenplum Orca optimizer has similar strategy: > > https://github.com/greenplum-db/gporca/blob/master/libgpopt/include/gpopt/base/CDistributionSpecHashed.h#L44 > > Regards, > Haisheng Yuan > > On 2021/05/25 15:37:32, Vladimir Ozerov <[email protected]> wrote: > > Hi, > > > > Consider the distributed SQL engine that uses a distribution property to > > model exchanges. Consider the following physical tree. To do the > > distributed join, we co-locate tuples using the equijoin key. Now the > Join > > operator has two equivalent distributions - [a1] and [b1]. It is critical > > to expose both distributions so that the top Aggregate can take advantage > > of the co-location. > > > > Aggregate[group=b1] > > DistributedJoin[a.a1=b.b1] // SHARDED[a1], SHARDED[b1] > > Input[a] // SHARDED[a1] > > Input[b] // SHARDED[b1] > > > > A similar example for the Project: > > Aggregate[group=$1] > > Project[$0=a, $1=a] // SHARDED[$0], SHARDED[$1] > > Input // SHARDED[a] > > > > The question is how to model this situation properly? > > > > First, it seems that RelMultipleTrait and RelCompositeTrait were designed > > to handle this situation. However, I couldn't make them work with the > > top-down optimizer. The reason is that when we register a RelNode with a > > composite trait in MEMO, VolcanoPlanner flattens the composite trait into > > the default trait value in RelSet.add -> RelTraitSet.simplify. That is, > the > > trait [SHARDED[a], SHARDED[b]] will be converted to [ANY] so that the > > original traits could not be derived in the PhysicalNode.derive methods. > > > > Second, we may try to model multiple sharding keys in a single trait. But > > this complicates the implementation of PhysicalNode.passThrough/derive > > significantly. > > SHARDED[a1, a2], SHARDED[b1, b2] -> SHARDED[[a1, a2], [b1, b2]] > > > > Third, we may expose multiple traits using metadata. RelMdDistribution > > would not work, because it exposes only a single distribution. But a > custom > > handler may potentially fix that. However, it will not be integrated with > > the top-down optimizer still, which makes the idea questionable. > > > > To summarize, it seems that currently there is no easy way to handle > > composite traits with a top-down optimizer. I wonder whether someone from > > the devlist already solved similar issues in Apache Calcite or other > > optimizers. If so, what was the approach or best practices? Intuitively, > it > > seems that RelMultipleTrait/RelCompositeTrait approach might be the way > to > > go. But why do we replace the original composite trait set with the > default > > value in the RelTraitSet.simplify routine? > > > > Regards, > > Vladimir. > > >
