On Mon, Dec 16, 2013 at 6:59 PM, Tom Lane <t...@sss.pgh.pa.us> wrote: > Robert Haas <robertmh...@gmail.com> writes: >> On Mon, Dec 16, 2013 at 2:04 PM, Tom Lane <t...@sss.pgh.pa.us> wrote: >>> The problem is that that optimization is a crock; see the comments >>> for create_or_index_quals(). We can't just turn it loose to CNF-ify >>> every OR it might find. The case that we support at the moment is >>> to CNF-ify whichever single OR condition looks like the best win, >>> and it's hard to see how to do that without any index knowledge. > >> Could we get by without actually converting to CNF? > > The hard part is not extracting the partial qual. The hard part is > trying to make sure that adding this entirely-redundant scan qual doesn't > catastrophically degrade join size estimates.
OK, I had a feeling that's where the problem was likely to be. Do you have any thoughts about a more principled way of solving this problem? I mean, off-hand, it's not clear to me that the comments about this being a MAJOR HACK aren't overstated. I mean, if we expect a join qual for {X Y} to have a given selectivity, but then we pre-filter X with a clause that is deliberately redundant with that join qual, then the join qual does indeed become less selective when view as applying to the surviving rows. This does not strike me as very much different from the oft-encountered problem of estimating selectivity for a = 1 AND b = 1, where a and b are correlated. Whichever qual we apply first changes the data distribution such that the selectivity of the second qual is not the same as it would have been when applied to the entirety of the data. Estimating it that way would not be a hack; it would be reality. Now, it might be true that frobbing what is intended as a cache based on the knowledge that the cache will never be flushed is a hack. > The hack of making an > inverse adjustment to the original OR clause's selectivity works, more or > less, for a single join OR condition. I don't think it works if there's > several modified OR conditions (possibly covering different sets of > relations). I might be missing something, but I suspect it works fine if every path for the relation is generating the same rows. The partial qual is definitely going to be applied before the join qual, so the join qual will surely be hitting only data that's been pre-filtered by the partial qual, and so its selectivity will be correspondingly more. Where it seems to me that you'd run in to trouble is if we created one path that doesn't bother enforcing the partial qual (relying on the fact that it will be applied post-join) and another path that does. Now you've got the same selectivity estimate for the join in two cases where the incoming data distribution is significantly different, which is bad. Do you see another danger? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers