clear functions.

Michael Paquier Wed, 03 Dec 2025 23:28:43 -0800

On Thu, Dec 04, 2025 at 08:52:16AM +0800, WangYu wrote:
> I was reviewing the recent patch
> v19-0003-Include-Extended-Statistics-in-pg_dump.patch and noticed a
> couple of small typo issues in the explanatory comments — nothing
> that affects the functionality.


Please be careful that the mailing lists of Postgres use
bottom-posting for thread discussions.  Please see here for details:
https://en.wikipedia.org/wiki/Posting_style#Bottom-posting

> Here are the two minor fixes I’d suggest:
> 1. “ndistintinct” should be “ndistinct”.
> 2. “depdendencies” should be “dependencies”.

Indeed.  These exact typos existed in some of the previous iterations
of the patch posted on this thread.

I have begun looking at 0001 and 0002 more seriously, and can now
share a couple of comments about them. 

My biggest issue with 0001 is the severe lack of documentation for all
the routines that are now in attribute_stats.c which are published so
as they can be used for the import functions of extended stats:

1) statatt_get_type(), that retrieves a set of information from a
relation and one of its attributes.  This is lightly documented in
extended_statistics_update() from 0002, with an argument claiming that
we do the same as attribute statistics.  If one then looks at
attribute_stats.c, we have a not-really-helpful "derive information
from attribute".  The trick is that we have an implied dependency
here: statatt_get_type() needs to be called *before*
statatt_get_elem_type() for attribute *and* extended stats,
statatt_get_elem_type() being fed data from statatt_get_type().

2) statatt_get_elem_type(), interesting piece of the puzzle that acts
as a wrapper for the type cache.  An important part here, it seems to
me, would be to at least tell that this relies on data retrieved from
statatt_get_type(), initially.  Once I've noticed this dependency the
logic felt a bit cleaner to understand, but we have no docs explaining
any of that..

3) statatt_init_empty_tuple(), that initializes a set of data to be
used for updates and/or insertion.  My first thought here is about 
"inherited", which turns around an assumption in
attribute_statistics_update().  How should callers assign its value?
For attribute_statistics_update() it comes as an input argument.
Also why are these values initialized as they are and why should these
make sense when applied to pg_statistic.  Extended stats set the
"inherited" argument to false in import_expressions().  That seems
equally important to document, both in a comment in extended_stats.c
and at the top of the function.  Of course it means that this refers
to the fact that the stats include values from stats tables, which do
not apply to extended stats.  But one would have to guess this fact
after knowing that this relates to the catalog pg_statistic..

4) statatt_set_slot is slightly simpler.  Still here, it is really easy
to miss that this routine should be fed data retrieved by
statatt_get_type().  staop can also be an eq or an lt operator.
stakind is one of the STATISTIC_KIND_* values, etc, etc.

5) text_to_stavalues() has a bit more explanation, with some data fed
from statatt_get_type() for the atttypmod and atttypid.  We have also
some fix of input coming from statatt_get_elem_type(), for elemtypid.

As far as I can see, there is nothing in these functions that require
them to be located in attribute_stats.c anymore.  Let's move that to
stats_utils.c, common place for all the shared facilities used by
these stats modules.  Or it could also be a new, separate patch, for
the manipulation and extraction of the tuple data related to
the pg_statistic tuples.

I am not OK with these being in attribute_stats.c, and as far as I can
see they have no contents that force these routines to 

Perhaps you know what these imply because you have written this code
originally in ce207d2a7901, but exposing them also means that it is
very important to document at least some concepts and assumptions
around them so as it is possible to guide somebody that may want to
use this code:
- What are the input arguments from?
- And what are the results?
- In which circumstances should these be used?

Then about 0002.

There are a bunch of assumptions embedded inside import_expressions()
that are interdependent with the calls of these attribute routines
which are a bit hard to evaluate by themselves, at least it's not
clear why some of these operations are done and why they are done the
way they are presented in the patch.  The code processes an array of
expressions and processes them in a sequential fashion, repeating
checks based on extended_stats_exprs_element.  Each step is going to
need more explanation to connect the dots.  For example, why the first
checks on exprs_nulls are important, with a second step of checks for
each STATISTIC_KIND_*.  Too much is let to the reader to guess, making
the code complicated to maintain as written, at least IMHO.

Is there any need to locate these new functions and code in
extended_stats.c, actually?  A separation into a new file seems like
it would be a cleaner result, leaving extended_stats.c to deal with
the import of the new data, including the fetch and deletion of any
existing data in pg_stat_ext that would be updated or inserted.
Perhaps name that extended_stats_funcs.c, as these are about the
direct SQL functions, import and deletion.

For the tests, it looks like it would be better to have everything in
a new file, like a stats_ext_import.sql for the clear and restore
bits.

A lot of the tests are copy-pastes of surrounding queries.  Perhaps it
would be better to use a SQL function wrapper or an IN clause with
multiple relations?  The patch has a lot of bloat with these repeated
queries..  Perhaps we should use a split for the elements in the
extended stats array instead of jsonb_pretty().  The results get quite
long here.

As of 0002, there are actually two independent pieces dealt with: the
deletion of extended stats with pg_clear_extended_stats(), and the
insert/update of extended stats with pg_restore_extended_stats().  I'd
suggest to split that into two parts, with the clear being first in
rank.  The deletion is simpler, and getting that in first simplifies
the review of the import part with less input to deal with.
--
Michael

signature.asc
Description: PGP signature

Re: Re: Extended Statistics set/restore/clear functions.

Reply via email to