Re: Statistics Import and Export

Corey Huinker Fri, 19 Jul 2024 18:59:02 -0700

> * Doc build error and malformatting.
>

Looking into it.



> * I'm not certain that we want all changes to relation stats to be non-
> transactional. Are there transactional use cases? Should it be an
> option? Should it be transactional for pg_set_relation_stats() but non-
> transactional for pg_restore_relation_stats()?
>

It's non-transactional because that's how ANALYZE does it to avoid bloating
pg_class. We _could_ do it transactionally, but on restore we'd immediately
have a pg_class that was 50% bloat.


>
> * The documentation for the pg_set_attribute_stats() still refers to
> upgrade scenarios -- shouldn't that be in the
> pg_restore_attribute_stats() docs? I imagine the pg_set variant to be
> used for ad-hoc planner stuff rather than upgrades.
>

Noted.


>
> * For the "WARNING: stat names must be of type text" I think we need an
> ERROR instead. The calling convention of name/value pairs is broken and
> we can't safely continue.
>

They can't be errors, because any one error fails the whole pg_upgrade.


> * The huge list of "else if (strcmp(statname, mc_freqs_name) == 0) ..."
> seems wasteful and hard to read. I think we already discussed this,
> what was the reason we can't just use an array to map the arg name to
> an arg position type OID?
>

That was my overreaction to the dislike that the P_argname enum got in
previous reviews.

We'd need an array of struct like

argname  (ex. "mc_vals")
argtypeoid  (one of: int, text, real, rea[])
argtypename (name we want to call the argtypeoid (integer, text. real,
real[] about covers it).
argpos (position in the arg list of the corresponding pg_set_ function


>
> * How much error checking did we decide is appropriate? Do we need to
> check that range_length_hist is always specified with range_empty_frac,
> or should we just call that the planner's problem if one is specified
> and the other not? Similarly, range stats for a non-range type.
>

I suppose we can let that go, and leave incomplete stat pairs in there.

The big risk is that somebody packs the call with more than 5 statkinds,
which would overflow the struct.


> * I think most of the tests should be of pg_set_*_stats(). For
> pg_restore_, we just want to know that it's translating the name/value
> pairs reasonably well and throwing WARNINGs when appropriate. Then, for
> pg_dump tests, it should exercise pg_restore_*_stats() more completely.
>

I was afraid you'd suggest that, in which case I'd break up the patch into
the pg_sets and the pg_restores.


> * It might help to clarify which arguments are important (like
> n_distinct) vs not. I assume the difference is that it's a non-NULLable
> column in pg_statistic.
>

There are NOT NULL stats...now. They might not be in the future. Does that
change your opinion?


>
> * Some arguments, like the relid, just seem absolutely required, and
> it's weird to just emit a WARNING and return false in that case.


Again, we can't fail.Any one failure breaks pg_upgrade.


> * To clarify: a return of "true" means all settings were successfully
> applied, whereas "false" means that some were applied and some were
> unrecognized, correct? Or does it also mean that some recognized
> options may not have been applied?
>

True means "at least some stats were applied. False means "nothing was
modified".


> * pg_set_attribute_stats(): why initialize the output tuple nulls array
> to false? It seems like initializing it to true would be safer.
>

+1


>
> * please use a better name for "k" and add some error checking to make
> sure it doesn't overrun the available slots.
>

k was an inheritance from analzye.c, from whence the very first version was
cribbed. No objection to renaming.



> * the pg_statistic tuple is always completely replaced, but the way you
> can call pg_set_attribute_stats() doesn't imply that -- calling
> pg_set_attribute_stats(..., most_common_vals => ..., most_common_freqs
> => ...) looks like it would just replace the most_common_vals+freqs and
> leave histogram_bounds as it was, but it actually clears
> histogram_bounds, right? Should we make that work or should we document
> better that it doesn't?
>

That would complicate things. How would we intentionally null-out one stat,
while leaving others unchanged? However, this points out that I didn't
re-instate the re-definition that applied the NULL defaults.

Re: Statistics Import and Export

Reply via email to