I have logged https://issues.apache.org/jira/browse/CALCITE-4559, "Create 'interface RexRule', a modular rewrite for row-expressions". Abstracting RexNode rewrites as objects would be a major step toward achieving the goals in this thread.
Now is a great chance to give feedback on this design. The APIs for registering rules and applying rules will be expensive to change later. On Sat, Mar 13, 2021 at 9:37 AM Vladimir Ozerov <[email protected]> wrote: > > Hi Julian, > > I agree that in your example normalization may have some different concerns > comparing to simplification. However, both normalization and simplification > sometimes address similar problems either. For example, the simplification > may decrease the search space, but so does the normalization. E.g. > normalized reordering of operands in a join condition may allow for the > merge of equivalent nodes that otherwise would be considered > non-equivalent. Do any of the currently implemented rules depend on some > normalized representation? > > Also, as many rules (such as join reorder rules) generate filters, I would > argue that moving the normalization to a separate phase might cause the > unnecessary expansion of the search space. > > The idea I expressed above is inspired by CockroachDB (again :-)). In > CockroachDB, expressions are part of the MEMO and treated similarly to > relational operators, which allows for the unified rule infrastructure for > both operators and expressions. Expressions are created using a > context-aware builder, which knows the set of active normalization rules. > Whenever a builder is to create a new expression (not necessarily > the top-level), the normalization rules are invoked in a heuristic manner. > The code generation is used to build the heuristic rule executor. Both > normalization and simplification (in our terms) rules are invoked here. For > example, see [1] (normalization) and [2] (simplification). Finally, the > expression is registered in MEMO. As a result, every expression ever > produced is always in a normalized/simplified form. > > I am not saying that we should follow this approach. But IMO (1) unified > handling of simplification and normalization through rules and (2) a single > entry point for all normalization (builder) are interesting design > decisions, as they offer both flexibility and convenience. > > Regards, > Vladimir. > > [1] > https://github.com/cockroachdb/cockroach/blob/release-21.1/pkg/sql/opt/norm/rules/scalar.opt#L8 > [2] > https://github.com/cockroachdb/cockroach/blob/release-21.1/pkg/sql/opt/norm/rules/bool.opt#L30 > > пт, 12 мар. 2021 г. в 07:15, Julian Hyde <[email protected]>: > > > Without simplifications, many trivial RelNodes would be produced. It is > > beneficial to have those in RelBuilder; if they were in rules, the trivial > > RelNodes (and equivalence sets) would still be present, increasing the size > > of the search space. > > > > I want to draw a distinction between simplification and normalization. A > > normalized form is relied upon throughout the system. Suppose for example, > > that we always normalize ‘RexLiteral = RexInputRef’ to ‘RexInputRef = > > RexLiteral’. If a rule encountered the latter case, it would not be a bug > > if the rule failed with, say, a ClassCastException. > > > > So, I disagree with Vladimir that 'RexSimplify may also be considered a > > “normalization”’. If simplification is turned off, each rule must be able > > to deal with the unsimplified expressions. > > > > Also, the very idea of normalizations being optional, enabled by system > > properties or other config, is rather disturbing, because the rules > > probably don’t know that the normalization has been turned off. > > > > The only place for normalization, in my opinion, is explicitly, in a > > particular planner phase. For example, pulling up all filters before > > attempting to match materialized views. > > > > Julian > > > > > On Mar 11, 2021, at 10:37 AM, Vladimir Ozerov <[email protected]> > > wrote: > > > > > > in our practice, we also had some problems with normalization. First, we > > > observed problems with the unwanted (and sometimes > > > incorrect) simplification of expressions with CASTs and literals which > > came > > > from RexSimplify. I couldn't find an easy way to disable that behavior. > > > Note, that RexSimplify may also be considered a "normalization". Second, > > we > > > implemented a way to avoid Project when doing join reordering but had > > some > > > issues with operator signatures due to lack of automatic normalization > > for > > > expressions for permuted inputs. These two cases demonstrate two opposite > > > views: sometimes you want a specific normalization to happen > > automatically, > > > but sometimes you want to disable it. > > > > > > Perhaps an alternative approach could be to unify all simplification and > > > normalization logic and split it into configurable rules. Then, we may > > add > > > these rules as a separate rule set to the planner, which would be invoked > > > heuristically every time an operator with expressions is registered in > > > MEMO. In this case, a user would not need to bother about RexNode > > > constructors. To clarify, under "rules" I do not mean heavy-weight rules > > > similar to normal rules. Instead, it might be simple pattern+method > > pairs, > > > that could even be compiled into a static program using Janino. This > > > approach could be very flexible and convenient: a single place in the > > code > > > where all rewrite happens, complete control of the optimization rules, > > > modular rules instead of monolithic code (like in RexSimplify). The > > obvious > > > downside - it would require more time to implement than other proposed > > > approaches. > > > > > > What do you think about that? > > > > > > Regards, > > > Vladimir. > > > > > > чт, 11 мар. 2021 г. в 13:33, Vladimir Sitnikov < > > [email protected] > > >> : > > > > > >> Stamatis>just the option to use it or not in a more friendly way > > >> Stamatis>than a system property. > > >> > > >> As far as I remember, the key issue here is that new RexBuilder(...) is > > a > > >> quite common pattern, > > >> and what you suggest looks like "everyone would have to provide extra > > >> argument when creating RexBuilder". > > >> > > >> On top of that, there are use cases like "new RexCall(...)" in the > > static > > >> context (see org.apache.calcite.rex.RexUtil#not). > > >> > > >> Making the uses customizable adds significant overhead with doubtful > > gains. > > >> > > >> I have not explored the route though, so there might be solutions. > > >> For instance, it might work if we have an in-core dependency injection > > that > > >> would hide the complexity > > >> when coding :core, however, I don't think we could expose DI to Calcite > > >> users. > > >> > > >> Vladimir > > >> > > > >
