On Mon, Aug 26, 2024 at 1:37 PM Tom Lane <t...@sss.pgh.pa.us> wrote: > For example, I don't see > how this gets us any closer to letting an extension fix a poor choice > of join order.
Thinking more about this particular sub-problem, let's say we're joining four tables A, B, C, and D. An extension wants to compel join order A-B-C-D. Let's suppose, however, that it wants to do this in a way where planning won't fail if that's impossible, so it wants to use disabled_nodes rather than skipping path generation entirely. When we're planning the baserels, we don't need to do anything special. When we plan 2-way joins, we need to mark all paths disabled except those originating from the A-B join. When we plan 3-way joins, we need to mark all paths disabled except those originating from an (A-B)-C join. When we plan the final 4-way join, we don't really need to do anything extra: the only way to end up with a non-disabled path at the top level is to pick a path from the (A-B)-C join and a path from D. There's a bit of nuance here, though. Suppose that when planning the A-B join, the planner generates HashJoin(SeqScan(B),Hash(A)). Does that path need to be disabled? If you think that join order A-B-C-D means that table A should be the driving table, then the answer is yes, because that path will lead to a join order beginning with B-A, not one beginning with A-B. But you might also have a mental model where it doesn't matter which side of the table is on which side of the join, and as long as you start by joining A and B in some way, that's good enough to qualify as an A-B join order. I believe actual implementations vary in which approach they take. I think that the beginning of add_paths_to_joinrel() looks like a useful spot to get control. You could, for example, have a hook there which returns a bitmask indicating which of merge-join, nested-loop, and hash join will be allowable for this call; that hook would then allow for control over the join method and the join order, and the join order control is strong enough that you can implement either of the two interpretations above. This idea theorizes that 0001 was wrong to make the path mask a per-RelOptInfo value, because there could be many calls to add_paths_to_joinrel() for a single RelOptInfo and, in this idea, every one of those can enforce a different mask. Potentially, such a hook could return additional information, either by using more bits of the bitmask or by returning other information via some other data type. For instance, I still believe that distinguishing between parameterized-nestloops and unparameterized-nestloops would be real darn useful, so we could have separate bits for each; or you could have a bit to control whether foreign-join paths get disabled (or are considered at all), or you could have separate bits for merge joins that involve 0, 1, or 2 sorts. Whether we need or want any or all of that is certainly debatable, but the point is that if you did want some of that, or something else, it doesn't look difficult to feed that information through to the places where you would need it to be available. Thoughts? -- Robert Haas EDB: http://www.enterprisedb.com