Re: pg_stats and range statistics

Egor Rogov Fri, 23 Jul 2021 11:06:09 -0700

Hi Tomas,

On 12.07.2021 16:04, Tomas Vondra wrote:

On 7/12/21 1:10 PM, Egor Rogov wrote:

Hi,


thanks for the review and corrections.

On 11.07.2021 21:54, Soumyadeep Chakraborty wrote:

Hello,

This should have been added with [1].

Excerpt from the documentation:
"pg_stats is also designed to present the information in a more readable
format than the underlying catalog — at the cost that its schema must
be extended whenever new slot types are defined for pg_statistic." [2]

So, I added a reminder in pg_statistic.h.

Good point.

Attached is v2 of this patch with some cosmetic changes.

I wonder why "TODO: catalog version bump"? This patch doesn't change
catalog structure, or I miss something?

It changes system_views.sql, which is catalog change, as it redefines
the pg_stats system view (it adds 3 more columns). So it changes what
you get after initdb, hence catversion has to be bumped.

Renamed the columns a
bit and updated the docs to be a bit more descriptive.
(range_length_empty_frac -> empty_range_frac, range_bounds_histogram ->
range_bounds_histograms)

I intended to make the same prefix ("range_") for all columns concerned
with range types, although I'm fine with the proposed naming.

Yeah, I'd vote to change empty_range_frac -> range_empty_frac.

One question:

We do have the option of representing the histogram of lower bounds
separately
from the histogram of upper bounds, as two separate view columns.
Don't know if
there is much utility though and there is a fair bit of added
complexity: see
below. Thoughts?

I thought about it too, and decided not to transform the underlying data
structure. As far as I can see, pg_stats never employed such
transformations. For example, STATISTIC_KIND_DECHIST is an array
containing the histogram followed by the average in its last element. It
is shown in pg_stats.elem_count_histogram as is, although it arguably
may be splitted into two fields. All in all, I believe pg_stats's job is
to "unpack" stavalues and stanumbers into meaningful fields, and not to
try to go deeper than that.

Not firm opinion, but the pg_stats is meant to be easier to
read/understand for humans. So far the transformation were simple
because all the data was fairly simple, but the range stuff may need
more complex transformation.

For example we do quite a bit more in pg_stats_ext views, because it
deals with multi-column stats.



In pg_stats_ext, yes, but not in pg_stats (at least until now).

Since no one has expressed a strong desire for a more complextransformation, should we proceed with the proposed approach (withfurther renaming empty_range_frac -> range_empty_frac as you suggested)?Or should we wait more for someone to weigh in?



regards

Re: pg_stats and range statistics

Reply via email to