I have logged https://issues.apache.org/jira/browse/CALCITE-4559,
"Create 'interface RexRule', a modular rewrite for row-expressions".
Abstracting RexNode rewrites as objects would be a major step toward
achieving the goals in this thread.

Now is a great chance to give feedback on this design. The APIs for
registering rules and applying rules will be expensive to change
later.

On Sat, Mar 13, 2021 at 9:37 AM Vladimir Ozerov <[email protected]> wrote:
>
> Hi Julian,
>
> I agree that in your example normalization may have some different concerns
> comparing to simplification. However, both normalization and simplification
> sometimes address similar problems either. For example, the simplification
> may decrease the search space, but so does the normalization. E.g.
> normalized reordering of operands in a join condition may allow for the
> merge of equivalent nodes that otherwise would be considered
> non-equivalent. Do any of the currently implemented rules depend on some
> normalized representation?
>
> Also, as many rules (such as join reorder rules) generate filters, I would
> argue that moving the normalization to a separate phase might cause the
> unnecessary expansion of the search space.
>
> The idea I expressed above is inspired by CockroachDB (again :-)). In
> CockroachDB, expressions are part of the MEMO and treated similarly to
> relational operators, which allows for the unified rule infrastructure for
> both operators and expressions. Expressions are created using a
> context-aware builder, which knows the set of active normalization rules.
> Whenever a builder is to create a new expression (not necessarily
> the top-level), the normalization rules are invoked in a heuristic manner.
> The code generation is used to build the heuristic rule executor. Both
> normalization and simplification (in our terms) rules are invoked here. For
> example, see [1] (normalization) and [2] (simplification). Finally, the
> expression is registered in MEMO. As a result, every expression ever
> produced is always in a normalized/simplified form.
>
> I am not saying that we should follow this approach. But IMO (1) unified
> handling of simplification and normalization through rules and (2) a single
> entry point for all normalization (builder) are interesting design
> decisions, as they offer both flexibility and convenience.
>
> Regards,
> Vladimir.
>
> [1]
> https://github.com/cockroachdb/cockroach/blob/release-21.1/pkg/sql/opt/norm/rules/scalar.opt#L8
> [2]
> https://github.com/cockroachdb/cockroach/blob/release-21.1/pkg/sql/opt/norm/rules/bool.opt#L30
>
> пт, 12 мар. 2021 г. в 07:15, Julian Hyde <[email protected]>:
>
> > Without simplifications, many trivial RelNodes would be produced. It is
> > beneficial to have those in RelBuilder; if they were in rules, the trivial
> > RelNodes (and equivalence sets) would still be present, increasing the size
> > of the search space.
> >
> > I want to draw a distinction between simplification and normalization. A
> > normalized form is relied upon throughout the system. Suppose for example,
> > that we always normalize ‘RexLiteral = RexInputRef’ to ‘RexInputRef =
> > RexLiteral’. If a rule encountered the latter case, it would not be a bug
> > if the rule failed with, say, a ClassCastException.
> >
> > So, I disagree with Vladimir that 'RexSimplify may also be considered a
> > “normalization”’. If simplification is turned off, each rule must be able
> > to deal with the unsimplified expressions.
> >
> > Also, the very idea of normalizations being optional, enabled by system
> > properties or other config, is rather disturbing, because the rules
> > probably don’t know that the normalization has been turned off.
> >
> > The only place for normalization, in my opinion, is explicitly, in a
> > particular planner phase. For example, pulling up all filters before
> > attempting to match materialized views.
> >
> > Julian
> >
> > > On Mar 11, 2021, at 10:37 AM, Vladimir Ozerov <[email protected]>
> > wrote:
> > >
> > > in our practice, we also had some problems with normalization. First, we
> > > observed problems with the unwanted (and sometimes
> > > incorrect) simplification of expressions with CASTs and literals which
> > came
> > > from RexSimplify. I couldn't find an easy way to disable that behavior.
> > > Note, that RexSimplify may also be considered a "normalization". Second,
> > we
> > > implemented a way to avoid Project when doing join reordering but had
> > some
> > > issues with operator signatures due to lack of automatic normalization
> > for
> > > expressions for permuted inputs. These two cases demonstrate two opposite
> > > views: sometimes you want a specific normalization to happen
> > automatically,
> > > but sometimes you want to disable it.
> > >
> > > Perhaps an alternative approach could be to unify all simplification and
> > > normalization logic and split it into configurable rules. Then, we may
> > add
> > > these rules as a separate rule set to the planner, which would be invoked
> > > heuristically every time an operator with expressions is registered in
> > > MEMO. In this case, a user would not need to bother about RexNode
> > > constructors. To clarify, under "rules" I do not mean heavy-weight rules
> > > similar to normal rules. Instead, it might be simple pattern+method
> > pairs,
> > > that could even be compiled into a static program using Janino. This
> > > approach could be very flexible and convenient: a single place in the
> > code
> > > where all rewrite happens, complete control of the optimization rules,
> > > modular rules instead of monolithic code (like in RexSimplify). The
> > obvious
> > > downside - it would require more time to implement than other proposed
> > > approaches.
> > >
> > > What do you think about that?
> > >
> > > Regards,
> > > Vladimir.
> > >
> > > чт, 11 мар. 2021 г. в 13:33, Vladimir Sitnikov <
> > [email protected]
> > >> :
> > >
> > >> Stamatis>just the option to use it or not in a more friendly way
> > >> Stamatis>than a system property.
> > >>
> > >> As far as I remember, the key issue here is that new RexBuilder(...) is
> > a
> > >> quite common pattern,
> > >> and what you suggest looks like "everyone would have to provide extra
> > >> argument when creating RexBuilder".
> > >>
> > >> On top of that, there are use cases like "new RexCall(...)" in the
> > static
> > >> context (see org.apache.calcite.rex.RexUtil#not).
> > >>
> > >> Making the uses customizable adds significant overhead with doubtful
> > gains.
> > >>
> > >> I have not explored the route though, so there might be solutions.
> > >> For instance, it might work if we have an in-core dependency injection
> > that
> > >> would hide the complexity
> > >> when coding :core, however, I don't think we could expose DI to Calcite
> > >> users.
> > >>
> > >> Vladimir
> > >>
> >
> >

Reply via email to