Hi Julian, First of all, I want to clarify that if we decide to do the cross-join suppression, we would have to do that for JoinCommuteRule either. I missed that in the original email.
Now back to your comments. The proposal assumes that we start with a connected join graph without the cross-products and never generate other cross-products. This covers many common queries, but not all. First, as you mentioned, some queries may potentially benefit from cross-products. I do not have specific queries in mind, but I cannot prove that such queries don't exist either. Second, we may have cross-products in an otherwise connected join graph if the join condition is located in the WHERE clause. For such queries, we may need to do some pre-processing, like filter pushdown. Therefore, I assume that real-world products would have to do some preliminary analysis and/or processing to decide whether it is safe to use the cross-join suppression. This might not be very convenient for new users, but IMO perfectly fine for production-grade systems because they typically do complex multi-phase optimization anyway. For example, recall how VoltDB decides whether to perform the join planning in the physical phase depending on the number of joins in the query [1]. Regarding cardinality estimations, the main problem is that the built-in Calcite mechanism is pretty imprecise because they use mainly row count and magic numbers. For example, in TPC-H queries, the cardinality estimations easily reach billions and billions of rows. And even if the system has more advanced cardinality estimators, such as histograms, the estimation errors are likely to build up quickly still [2]. That said, it could be difficult to design a robust heuristic to suppress particular joins. The advantage of a straightforward cross-join suppression heuristic is that it is a good choice for a good fraction of real-world queries. To clarify, the proposal is not to make the cross-join suppression the default behavior. Rather, we may add it as a configuration property to join planning rules, that could be used by advanced users. Regards, Vladimir. [1] https://github.com/VoltDB/voltdb/blob/0f2993cb9e1efe7c2c95cf68b83f10903e2697d3/src/frontend/org/voltdb/compiler/PlannerTool.java#L274 [2] https://dl.acm.org/doi/10.14778/2850583.2850594 сб, 13 февр. 2021 г. в 23:05, Julian Hyde <[email protected]>: > Two separate points. > > 1. I recall that there is an important class of plans that first forms the > Cartesian product of some very small relations, and then joins everything > else to it. Does anyone else encounter these plans? Would these plans be > unavailable if we made your proposed change? > > 2. If I join on a low cardinality attribute (eg customers to suppliers on > state) the result is almost as bad a cross join. I presume you would want > to treat this join similarly. If so, it would make sense to have the rules > on selectivity (or similar) rather than structure alone. > > Julian > > > On Feb 12, 2021, at 23:04, Vladimir Ozerov <[email protected]> wrote: > > > > Hi, > > > > Join order planning is an important optimization problem. The widely-used > > heuristic is to consider all bushy trees without cross-joins. There is > > proof [1] that a pair of commute and associate rules is a complete > ruleset > > to explore all bushy trees without the cross-joins if the initial join > tree > > has no cross-joins. > > > > In Apache Calcite, we do not suppress cross-products. As a result, even > > simple join topologies, like a chain, cannot be planned in a reasonable > > time for more than 6-7 tables. How do you feel if we add an optional > > cross-join suppression to the JoinAssociateRule, and possibly > > JoinPushThroughJoinRule? > > > > The cross-join suppression is not the only problem with the exhaustive > join > > planning in Apache Calcite. But at the very least, it is simple to > > implement, and it extends the number of tables in a join that could be > > planned exhaustively for some topologies by additional 2-5 tables. > > > > Regards, > > Vladimir. > > > > [1] https://dl.acm.org/doi/10.14778/2732977.2732997 >
