On 12/15/22 17:46, Tom Lane wrote: > James Finnerty <jfinn...@amazon.com> writes: >> This patch looks good to me. I have two very minor nits: The inflation >> of the sample size by 10% is arbitrary but it doesn't seem unreasonable >> or concerning. It just makes me curious if there are any known cases >> that motivated adding this logic. > > I wondered why, too. >
Not sure about known cases, but the motivation is explained in the comment before the 10% is applied. The repluples value is never going to be spot on, and we use that to determine what fraction of the table to sample (because all built-in TABLESAMPLE methods only accept fraction to sample, not expected number of rows). If pg_class.reltuples is lower (than the actual row count), we'll end up with sample fraction too high. That's mostly harmless, as we'll then discard some of the rows locally. But if the pg_class.reltuples is higher, we'll end up sampling too few rows (less than targrows). This can happen e.g. after a bulk delete. Yes, the 10% value is mostly arbitrary, and maybe it's pointless. How much may the stats change with 10% larger sample? Probably not much, so is it really solving anything? Also, maybe it's fixing the issue at the wrong level - if stale reltuples are an issue, maybe the right fix is making it more accurate on the source instance. Decrease autovacuum_vacuum_scale_factor, or maybe also look at pg_stat_all_tables or something. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company