On Fri, Nov 28, 2025 at 2:21 PM Tomas Vondra <[email protected]> wrote:= > The patch identifies dimensions without restrictions, moves them aside, > and does regular join search on the rest of the relations (some of which > may be dimensions with restrictions). > > So the presence of dimensions with restrictions does not disable the > optimization entirely, it just means the dimensions with restrictions > won't benefit from it. So in the worst case, it'll perform just like > before (assuming the optimization is kept cheap enough).
I'm guessing that the reason you're limiting this to relations without restrictions is so that you don't have to reason about the join causing cardinality changes. But I don't quite understand how you avoid having that problem anyway. For example, suppose that we have a fact table F which is joined to relations S1, S2, D1, D2, I1, and I2. The joins to S1 and S2 are joins on FKs to unconstrained dimension tables; i.e. cardinality stays the same. The joins to D1 and D2 are joins on FKs to constrained dimension tables, i.e. cardinality decreases. The joins to I1 and I2 on average match more than once, i.e. cardinality increases. The optimal join order is presumably something like F-D1-D2-S1-S2-I1-I2, so that the number of rows flowing through each level of the join tree is kept as small as possible, but how do you achieve this if the joins to S1 and S2 are set aside? In the presence of row-count-inflating joins, it's wrong to suppose that S1 and S2 should be joined at the end. What seems more likely to be true is that we should perform the join to S1 and the join to S2 consecutively. It's very common in my experience for complex reporting queries to have a bunch of joins the results of which matter very little: each one changes the cardinality very little or not at all, and so any ordering of those joins produces essentially the same result, and probably it's best to do them all at whatever point in the join sequence the cardinality of the other input is at lowest. However, I'm not sure that even this is guaranteed. For instance, suppose that in the above example there's no index on the column of F that joins to S2. Could it be that the only reasonable way to join to S2 is a merge join? If so, it might be best to start by join F to S2 using an index scan on each, and then continue by joining to D1, D2, S1, I1, I2 in that order. Every time I start to think about this kind of optimization, I find myself getting hung up on corner cases like these which are, maybe, not all that probable, but which I think people almost certainly do rely on the planner to get right. I don't know what to do about that. I certainly agree with the idea that we waste a lot of energy searching through functionally identical join orders and that we should find a way to avoid that, but I'm a little suspicious that avoiding regressions, or even finding out whether we've introduced any, will prove difficult. -- Robert Haas EDB: http://www.enterprisedb.com
