Hi Vladimir, Julian, I want to distinguish between two cases.
Some projects may decide to use Calcite's distribution trait. To my knowledge, this is not a common pattern because it is not really integrated into Calcite. It is not destroyed/adjusted in rules and operators as needed, not integrated into EnumerableConvention.enforce, etc. Other projects may decide to use a custom distribution trait. Examples are Apache Flink, Hazelcast, and some other private projects we work on. There are many reasons to do this. A couple of examples: 1. Calcite's distribution produces logical exchange, while production grade-optimizers are typically multi-phase and want the distribution convention to produce physical exchanges in a dedicated physical phase(s). 2. Some systems may have custom requirements for distribution, such as propagating the number of shards, supporting multiple equivalent keys, etc. But in both cases, the bottom line is that the Enumerable currently cannot work with both built-in and custom distributions because the associated code is not implemented in Calcite's core. And even if we add the fully-fledged support of the built-in distribution to Enumerable, many projects will continue using custom distribution traits because the exchange is a physical operation with lots of backend-dependent specific quirks, and any attempt to model it abstractly in Calcite's core is unlikely to cover some edge cases. The same applies to any other custom trait that depends on columns - Enumerable will not be able to process it correctly. Therefore, instead of having a definitively broken code, it might be better to apply the defensive approach when the whole Enumerable backend provides a clear and consistent contract: we support collation and reset everything else. IMO it is better because it matches the current behavior and would never cause strange bugs in a user code. If in the future we invest in the proper integration of the built-in distribution or figure out how to "externalize" the trait propagation for Enumerable operators, we may relax this statement. Please let me know if it makes any sense. Regards, Vladimir. вт, 4 мая 2021 г. в 21:02, Julian Hyde <[email protected]>: > > I would say known in-core vs unknown trait is a reasonable approach to > > distingush traits. > > Easy, but not reasonable. It will make it very difficult to reuse > existing rels and rules (e.g. Enumerable) in a downstream project that > has defined its own traits. > > On Tue, May 4, 2021 at 10:44 AM Vladimir Sitnikov > <[email protected]> wrote: > > > > > It seems arbitrary to include Collation but exclude other traits. > > > > I would say known in-core vs unknown trait is a reasonable approach to > > distingush traits. > > > > Vladimir >
