Maxence Ahlouche wrote: > It seems to me that since the pg_stats view is supposed to be > human-readable, it would make sense to show a human-readable version > of n_distinct. > Currently, when the stats collector estimates that the number of > distinct values is more than 10% of the total row count, what is > stored in pg_statistic.stadistinct is -1 * n_distinct / totalrows, the > rationale being that if new rows are inserted in the table, they are > likely to introduce new values, and storing that value allows the > stadistinct not to get stale too fast. > > You can find attached a simple WIP patch to show the proper n_distinct > value in pg_stats. > > * Is this desired? > * Would it make sense to add a column in the pg_stats view to display > the information "lost", that is the fact that postgres will assume > that inserting new rows means a higher n_distinct? > * Am I right to assume that totalrows in the code > (src/backend/commands/analyze.c:2170) actually corresponds to > n_live_tup? That's what I gathered from glancing at the code, but I > might be wrong. > * Should the catalog version be changed for this kind of change? > * Should I add this patch to the commitfest? > > If this patch is actually desired, I'll update the documentation as well. > I'm guessing this patch would break scripts relying on the pg_stats > view, but I do not know how much we want to avoid that, since they > should rely on the base tables rather than on the views.
This may make things easier for those who are confused by a negative entry, but it will obfuscate matters for those who are not. I don't think that is a win, particularly since the semantics are explained in great detail in the documentation of "pg_stats". So I am -1 on that one. Yours, Laurenz Albe