I wrote: > On 12/28/17, Jeff Janes <jeff.ja...@gmail.com> wrote: >> I think that perhaps maxmincount should also use the dynamic >> values_cnt_remaining rather than the static one. After all, things >> included in the MCV don' get represented in the histogram. When I've >> seen >> planning problems due to skewed distributions I also usually see >> redundant >> values in the histogram boundary list which I think should be in the MCV >> list instead. But I have not changed that here, pending discussion. > > I think this is also a good idea, but I haven't thought it through. If > you don't go this route, I would move this section back out of the > loop as well.
I did some quick and dirty testing of this, and I just want to note that in this case, setting mincount to its hard-coded minimum must come after setting it to maxmincount, since now maxmincount can go arbitrarily low. I'll be travelling for a few days, but I'll do some testing on some data sets soon. While looking through the archives for more info, I saw this thread https://www.postgresql.org/message-id/32261.1496611829%40sss.pgh.pa.us which showcases the opposite problem: For more uniform distributions, there are too many MCVs. Not relevant to your problem, but if I have time I'll try my hand at testing an approach suggested in that thread at the same time I test your patch and see how it interacts. -John Naylor