Re: postgres_fdw: using TABLESAMPLE to collect remote sample

Tomas Vondra Wed, 21 Dec 2022 04:08:22 -0800

On 12/15/22 17:46, Tom Lane wrote:
> James Finnerty <jfinn...@amazon.com> writes:
>> This patch looks good to me.  I have two very minor nits: The inflation
>> of the sample size by 10% is arbitrary but it doesn't seem unreasonable
>> or concerning.  It just makes me curious if there are any known cases
>> that motivated adding this logic.
> 
> I wondered why, too.
>


Not sure about known cases, but the motivation is explained in the
comment before the 10% is applied.

The repluples value is never going to be spot on, and we use that to
determine what fraction of the table to sample (because all built-in
TABLESAMPLE methods only accept fraction to sample, not expected number
of rows).

If pg_class.reltuples is lower (than the actual row count), we'll end up
with sample fraction too high. That's mostly harmless, as we'll then
discard some of the rows locally.

But if the pg_class.reltuples is higher, we'll end up sampling too few
rows (less than targrows). This can happen e.g. after a bulk delete.

Yes, the 10% value is mostly arbitrary, and maybe it's pointless. How
much may the stats change with 10% larger sample? Probably not much, so
is it really solving anything?

Also, maybe it's fixing the issue at the wrong level - if stale
reltuples are an issue, maybe the right fix is making it more accurate
on the source instance. Decrease autovacuum_vacuum_scale_factor, or
maybe also look at pg_stat_all_tables or something.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: postgres_fdw: using TABLESAMPLE to collect remote sample

Reply via email to