I looked into the behavior complained of in http://www.postgresql.org/message-id/CAMkU=1xliwdkfemkdwjznr_jmzuybzzrz4f22kxa3vg6pz9...@mail.gmail.com I'm still not sure whether anything else is going on in the original problem, but I now understand Jeff's simplified query. The planner does actually generate the desired plan, but it doesn't pick it because it misestimates the row count and hence the cost. The issue is that it generates an indexable OR clause by pulling a couple of sub-clauses out of the messy OR condition in the original query:
template2."id" = product2."id" or case when product1."id" is not null then 1 when template2."id" is not null then 0 end <> 1 and product2."id" = 2916353 If we focus our attention on just template2."id" = product2."id" or product2."id" = 2916353 we have something that can be used in a parameterized (by template2) bitmap scan of product2. We do get as far as finding that out and building a bitmap scan path, but the path is marked as yielding 2918 rows (the whole product table), not the 2 rows it actually will produce. That's because the parameterized path code is designed to assume that all identically-parameterized scans will produce the same number of rows, and it's already computed that row estimate without the benefit of the extracted OR clause. So this results in a factor-of-1400 overestimate of the cost of the nestloop, resulting in a wrong plan choice. We didn't have this problem in 9.1 or earlier because the planner did not assume that inner indexscan paths all produced the same number of rows. As of 9.2 there's an expectation that if two paths have the same parameterization then they will enforce the same set of pushed-down join clauses and hence yield the same number of rows. We could drop that assumption, but doing so would destroy the basis of add_path_precheck(), because a more expensive path could be worth keeping if it yields fewer rows. So we'd either have to give up two-phase estimation of join path costs entirely, or do rowcount estimation in the first phase which'd likely destroy most of its advantage anyway. I would not be totally sad to get rid of the two-phase estimates because they greatly complicate the API and logic in costsize.c, but they did seem to produce a useful speedup in planning of complex joins. If we don't do that, we need some less klugy way of dealing with indexclauses extracted from OR clauses. The idea I have about this is to go ahead and put such clauses into the regular join clause lists, but mark them as being derived from their original clauses (say by adding a field to RestrictInfo that links back to the original RestrictInfo). The effect of the marking would be that the derived clause would be ignored for cost and selectivity estimates anytime it is present in the same list as its parent clause, so that we don't end up double-counting it. I think this would let us get rid of the very klugy selectivity hacking that's currently done in orindxpath.c to solve a similar problem in the non-join case. A downside of this approach is that to preserve the same-number-of-rows assumption, we'd end up having to enforce the extracted clauses as filter clauses in parameterized paths, even if they'd not proved to be of any use as index quals. Whichever way we go, the resulting patch is likely to be too large and invasive for me to feel terribly comfortable about back-patching it into 9.2. AFAICT this issue only arises for indexquals extracted out of larger OR conditions, so maybe it's not worth taking such a risk for. Thoughts? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers