On Mon, 25 Mar 2019 at 23:36, Tomas Vondra <tomas.von...@2ndquadrant.com> wrote: > > Attached is an updated patch, fixing all the issues pointed out so far. > Unless there are some objections, I plan to commit the 0001 part by the > end of this CF. Part 0002 is a matter for PG13, as previously agreed. >
Yes, I think that's reasonable. It looks to be in pretty good shape. I have reviewed most of the actual code, but note that I haven't reviewed the docs changes and I didn't spend much time reading code comments. It might benefit from a quick once-over comment tidy up. I just looked through the latest set of changes and I have a couple of additional review comments: In the comment about WIDTH_THRESHOLD, s/pg_statistic/pg_statistic_ext/. In statext_mcv_build(), I'm not convinced by the logic around keeping the whole MCV list if it fits. Suppose there were a small number of very common values, and then a bunch of uniformly distributed less common values. The sample might consist of all the common values, plus one or two instances of some of the uncommon ones, leading to a list that would fit, but it would not be appropriate to keep the uncommon values on the basis of having seen them only one or two times. The fact that the list of items seen fits doesn't by itself mean that they're all common enough to justify being kept. In the per-column stats case, there are a bunch of other checks that have to pass, which are intended to test not just that the list fits, but that it believes that those are all the items in the table. For MV stats, you don't have that, and so I think it would be best to just remove that test (the "if (ngroups > nitems)" test) and *always* call get_mincount_for_mcv_list() to determine how many MCV items to keep. Otherwise there is a risk of keeping too many MCV items, with the ones at the tail end of the list producing poor estimates. Regards, Dean