Alexander Korotkov <aekorot...@gmail.com> writes: > On Sun, Mar 4, 2012 at 5:38 AM, Tom Lane <t...@sss.pgh.pa.us> wrote: >> 2. The tests in the above-mentioned message show that in most cases >> where mcelem_array_contained_selec falls through to the "rough >> estimate", the resulting rowcount estimate is just 1, ie we are coming >> out with very small selectivities. Although that path will now only be >> taken when there are no stats, it seems like we'd be better off to >> return DEFAULT_CONTAIN_SEL instead of what it's doing. I think there >> must be something wrong with the "rough estimate" logic. Could you >> recheck that?
> I think the wrong think with "rough estimate" is that assumption about > independent occurrences of items is very unsuitable even for "rough > estimate". The following example shows that "rough estimate" really works > in the case of independent occurrences of items. ... > It this particular case "rough estimate" is quite accurate. But in most > part of cases it behaves really bad. It is why I started to invent > calc_distr and etc. So, I think return DEFAULT_CONTAIN_SEL is OK unless > we've some better ideas. OK. Looking again at that code, I notice that it also punts and returns DEFAULT_CONTAIN_SEL if it's not given MCELEM stats, which it more or less has to because without even a minfreq the whole calculation is just hot air. And there are no plausible scenarios where compute_array_stats would produce an MCELEM slot but no count histogram. So that says there is no point in sweating over this case, unless you have an idea how to produce useful results without MCELEM. So I think it's sufficient to punt at the top of the function if no histogram, and take out the various attempts to cope with the case. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers