Thank you for the feedback. I'll try to prototype it for all the affected
rules.

Regards,
Vladimir

чт, 18 февр. 2021 г. в 13:02, Stamatis Zampetakis <[email protected]>:

> Hi Vladimir,
>
> Thanks for bringing up this topic.
>
> There are various works who highlight the importance of cartesian products
> for optimality and try to derive techniques for exploring the complete
> search space efficiently.
> Having said that indeed not considering cartesian products is a common &
> popular heuristic for queries with many relations so I find it a good idea
> to give to the users the option to use it or not.
>
> Best,
> Stamatis
>
> On Sun, Feb 14, 2021 at 9:58 AM Vladimir Ozerov <[email protected]>
> wrote:
>
> > Hi Julian,
> >
> > First of all, I want to clarify that if we decide to do the cross-join
> > suppression, we would have to do that for JoinCommuteRule either. I
> missed
> > that in the original email.
> >
> > Now back to your comments. The proposal assumes that we start with a
> > connected join graph without the cross-products and never generate other
> > cross-products. This covers many common queries, but not all. First, as
> you
> > mentioned, some queries may potentially benefit from cross-products. I do
> > not have specific queries in mind, but I cannot prove that such queries
> > don't exist either. Second, we may have cross-products in an otherwise
> > connected join graph if the join condition is located in the WHERE
> clause.
> > For such queries, we may need to do some pre-processing, like filter
> > pushdown. Therefore, I assume that real-world products would have to do
> > some preliminary analysis and/or processing to decide whether it is safe
> to
> > use the cross-join suppression. This might not be very convenient for new
> > users, but IMO perfectly fine for production-grade systems because they
> > typically do complex multi-phase optimization anyway. For example, recall
> > how VoltDB decides whether to perform the join planning in the physical
> > phase depending on the number of joins in the query [1].
> >
> > Regarding cardinality estimations, the main problem is that the built-in
> > Calcite mechanism is pretty imprecise because they use mainly row count
> and
> > magic numbers. For example, in TPC-H queries, the cardinality estimations
> > easily reach billions and billions of rows. And even if the system has
> more
> > advanced cardinality estimators, such as histograms, the estimation
> errors
> > are likely to build up quickly still [2]. That said, it could be
> difficult
> > to design a robust heuristic to suppress particular joins. The advantage
> of
> > a straightforward cross-join suppression heuristic is that it is a good
> > choice for a good fraction of real-world queries.
> >
> > To clarify, the proposal is not to make the cross-join suppression the
> > default behavior. Rather, we may add it as a configuration property to
> join
> > planning rules, that could be used by advanced users.
> >
> > Regards,
> > Vladimir.
> >
> > [1]
> >
> >
> https://github.com/VoltDB/voltdb/blob/0f2993cb9e1efe7c2c95cf68b83f10903e2697d3/src/frontend/org/voltdb/compiler/PlannerTool.java#L274
> > [2] https://dl.acm.org/doi/10.14778/2850583.2850594
> >
> > сб, 13 февр. 2021 г. в 23:05, Julian Hyde <[email protected]>:
> >
> > > Two separate points.
> > >
> > > 1. I recall that there is an important class of plans that first forms
> > the
> > > Cartesian product of some very small relations, and then joins
> everything
> > > else to it. Does anyone else encounter these plans? Would these plans
> be
> > > unavailable if we made your proposed change?
> > >
> > > 2. If I join on a low cardinality attribute (eg customers to suppliers
> on
> > > state) the result is almost as bad a cross join. I presume you would
> want
> > > to treat this join similarly. If so, it would make sense to have the
> > rules
> > > on selectivity (or similar) rather than structure alone.
> > >
> > > Julian
> > >
> > > > On Feb 12, 2021, at 23:04, Vladimir Ozerov <[email protected]>
> wrote:
> > > >
> > > > Hi,
> > > >
> > > > Join order planning is an important optimization problem. The
> > widely-used
> > > > heuristic is to consider all bushy trees without cross-joins. There
> is
> > > > proof [1] that a pair of commute and associate rules is a complete
> > > ruleset
> > > > to explore all bushy trees without the cross-joins if the initial
> join
> > > tree
> > > > has no cross-joins.
> > > >
> > > > In Apache Calcite, we do not suppress cross-products. As a result,
> even
> > > > simple join topologies, like a chain, cannot be planned in a
> reasonable
> > > > time for more than 6-7 tables. How do you feel if we add an optional
> > > > cross-join suppression to the JoinAssociateRule, and possibly
> > > > JoinPushThroughJoinRule?
> > > >
> > > > The cross-join suppression is not the only problem with the
> exhaustive
> > > join
> > > > planning in Apache Calcite. But at the very least, it is simple to
> > > > implement, and it extends the number of tables in a join that could
> be
> > > > planned exhaustively for some topologies by additional 2-5 tables.
> > > >
> > > > Regards,
> > > > Vladimir.
> > > >
> > > > [1] https://dl.acm.org/doi/10.14778/2732977.2732997
> > >
> >
>

Reply via email to