Hi Vladimir, Julian,

I want to distinguish between two cases.

Some projects may decide to use Calcite's distribution trait. To my
knowledge, this is not a common pattern because it is not really integrated
into Calcite. It is not destroyed/adjusted in rules and operators as
needed, not integrated into EnumerableConvention.enforce, etc.

Other projects may decide to use a custom distribution trait. Examples are
Apache Flink, Hazelcast, and some other private projects we work on. There
are many reasons to do this. A couple of examples:
1. Calcite's distribution produces logical exchange, while production
grade-optimizers are typically multi-phase and want the distribution
convention to produce physical exchanges in a dedicated physical phase(s).
2. Some systems may have custom requirements for distribution, such as
propagating the number of shards, supporting multiple equivalent keys, etc.

But in both cases, the bottom line is that the Enumerable currently cannot
work with both built-in and custom distributions because the associated
code is not implemented in Calcite's core. And even if we add the
fully-fledged support of the built-in distribution to Enumerable, many
projects will continue using custom distribution traits because the
exchange is a physical operation with lots of backend-dependent specific
quirks, and any attempt to model it abstractly in Calcite's core is
unlikely to cover some edge cases.

The same applies to any other custom trait that depends on columns -
Enumerable will not be able to process it correctly.

Therefore, instead of having a definitively broken code, it might be better
to apply the defensive approach when the whole Enumerable backend provides
a clear and consistent contract: we support collation and reset everything
else. IMO it is better because it matches the current behavior and would
never cause strange bugs in a user code. If in the future we invest in the
proper integration of the built-in distribution or figure out how to
"externalize" the trait propagation for Enumerable operators, we may relax
this statement.

Please let me know if it makes any sense.

Regards,
Vladimir.

вт, 4 мая 2021 г. в 21:02, Julian Hyde <[email protected]>:

> > I would say known in-core vs unknown trait is a reasonable approach to
> > distingush traits.
>
> Easy, but not reasonable. It will make it very difficult to reuse
> existing rels and rules (e.g. Enumerable) in a downstream project that
> has defined its own traits.
>
> On Tue, May 4, 2021 at 10:44 AM Vladimir Sitnikov
> <[email protected]> wrote:
> >
> > > It seems arbitrary to include Collation but exclude other traits.
> >
> > I would say known in-core vs unknown trait is a reasonable approach to
> > distingush traits.
> >
> > Vladimir
>

Reply via email to