On Thu, Dec 04, 2025 at 08:52:16AM +0800, WangYu wrote: > I was reviewing the recent patch > v19-0003-Include-Extended-Statistics-in-pg_dump.patch and noticed a > couple of small typo issues in the explanatory comments — nothing > that affects the functionality.
Please be careful that the mailing lists of Postgres use bottom-posting for thread discussions. Please see here for details: https://en.wikipedia.org/wiki/Posting_style#Bottom-posting > Here are the two minor fixes I’d suggest: > 1. “ndistintinct” should be “ndistinct”. > 2. “depdendencies” should be “dependencies”. Indeed. These exact typos existed in some of the previous iterations of the patch posted on this thread. I have begun looking at 0001 and 0002 more seriously, and can now share a couple of comments about them. My biggest issue with 0001 is the severe lack of documentation for all the routines that are now in attribute_stats.c which are published so as they can be used for the import functions of extended stats: 1) statatt_get_type(), that retrieves a set of information from a relation and one of its attributes. This is lightly documented in extended_statistics_update() from 0002, with an argument claiming that we do the same as attribute statistics. If one then looks at attribute_stats.c, we have a not-really-helpful "derive information from attribute". The trick is that we have an implied dependency here: statatt_get_type() needs to be called *before* statatt_get_elem_type() for attribute *and* extended stats, statatt_get_elem_type() being fed data from statatt_get_type(). 2) statatt_get_elem_type(), interesting piece of the puzzle that acts as a wrapper for the type cache. An important part here, it seems to me, would be to at least tell that this relies on data retrieved from statatt_get_type(), initially. Once I've noticed this dependency the logic felt a bit cleaner to understand, but we have no docs explaining any of that.. 3) statatt_init_empty_tuple(), that initializes a set of data to be used for updates and/or insertion. My first thought here is about "inherited", which turns around an assumption in attribute_statistics_update(). How should callers assign its value? For attribute_statistics_update() it comes as an input argument. Also why are these values initialized as they are and why should these make sense when applied to pg_statistic. Extended stats set the "inherited" argument to false in import_expressions(). That seems equally important to document, both in a comment in extended_stats.c and at the top of the function. Of course it means that this refers to the fact that the stats include values from stats tables, which do not apply to extended stats. But one would have to guess this fact after knowing that this relates to the catalog pg_statistic.. 4) statatt_set_slot is slightly simpler. Still here, it is really easy to miss that this routine should be fed data retrieved by statatt_get_type(). staop can also be an eq or an lt operator. stakind is one of the STATISTIC_KIND_* values, etc, etc. 5) text_to_stavalues() has a bit more explanation, with some data fed from statatt_get_type() for the atttypmod and atttypid. We have also some fix of input coming from statatt_get_elem_type(), for elemtypid. As far as I can see, there is nothing in these functions that require them to be located in attribute_stats.c anymore. Let's move that to stats_utils.c, common place for all the shared facilities used by these stats modules. Or it could also be a new, separate patch, for the manipulation and extraction of the tuple data related to the pg_statistic tuples. I am not OK with these being in attribute_stats.c, and as far as I can see they have no contents that force these routines to Perhaps you know what these imply because you have written this code originally in ce207d2a7901, but exposing them also means that it is very important to document at least some concepts and assumptions around them so as it is possible to guide somebody that may want to use this code: - What are the input arguments from? - And what are the results? - In which circumstances should these be used? Then about 0002. There are a bunch of assumptions embedded inside import_expressions() that are interdependent with the calls of these attribute routines which are a bit hard to evaluate by themselves, at least it's not clear why some of these operations are done and why they are done the way they are presented in the patch. The code processes an array of expressions and processes them in a sequential fashion, repeating checks based on extended_stats_exprs_element. Each step is going to need more explanation to connect the dots. For example, why the first checks on exprs_nulls are important, with a second step of checks for each STATISTIC_KIND_*. Too much is let to the reader to guess, making the code complicated to maintain as written, at least IMHO. Is there any need to locate these new functions and code in extended_stats.c, actually? A separation into a new file seems like it would be a cleaner result, leaving extended_stats.c to deal with the import of the new data, including the fetch and deletion of any existing data in pg_stat_ext that would be updated or inserted. Perhaps name that extended_stats_funcs.c, as these are about the direct SQL functions, import and deletion. For the tests, it looks like it would be better to have everything in a new file, like a stats_ext_import.sql for the clear and restore bits. A lot of the tests are copy-pastes of surrounding queries. Perhaps it would be better to use a SQL function wrapper or an IN clause with multiple relations? The patch has a lot of bloat with these repeated queries.. Perhaps we should use a split for the elements in the extended stats array instead of jsonb_pretty(). The results get quite long here. As of 0002, there are actually two independent pieces dealt with: the deletion of extended stats with pg_clear_extended_stats(), and the insert/update of extended stats with pg_restore_extended_stats(). I'd suggest to split that into two parts, with the clear being first in rank. The deletion is simpler, and getting that in first simplifies the review of the import part with less input to deal with. -- Michael
signature.asc
Description: PGP signature
