Re: [HACKERS] VACUUM and ANALYZE disagreeing on what reltuples means

Tomas Vondra Sat, 18 Nov 2017 13:41:36 -0800

Hi,

On 11/02/2017 08:15 PM, Tom Lane wrote:
> Haribabu Kommi <kommi.harib...@gmail.com> writes:
>> The changes are fine and now it reports proper live tuples in both
>> pg_class and stats. The other issue of continuous vacuum operation
>> leading to decrease of number of live tuples is not related to this
>> patch and that can be handled separately.
> 
> I did not like this patch too much, because it did nothing to fix
> the underlying problem of confusion between "live tuples" and "total
> tuples"; in fact, it made that worse, because now the comment on
> LVRelStats.new_rel_tuples is a lie.  And there's at least one usage
> of that field value where I think we do want total tuples not live
> tuples: where we pass it to index AM cleanup functions.  Indexes
> don't really care whether heap entries are live or not, only that
> they're there.  Plus the VACUUM VERBOSE log message that uses the
> field is supposed to be reporting total tuples not live tuples.
> 
> I hacked up the patch trying to make that better, finding along the
> way that contrib/pgstattuple shared in the confusion about what
> it was trying to count.  Results attached.
>


Thanks for looking into this. I agree your patch is more consistent and
generally cleaner.

> However, I'm not sure we're there yet, because there remains a fairly
> nasty discrepancy even once we've gotten everyone onto the same page
> about reltuples counting just live tuples: VACUUM and ANALYZE have
> different definitions of what's "live".  In particular they do not treat
> INSERT_IN_PROGRESS and DELETE_IN_PROGRESS tuples the same.  Should we
> try to do something about that?  If so, what?  It looks like ANALYZE's
> behavior is pretty tightly constrained, judging by the comments in
> acquire_sample_rows.
> 

Yeah :-(

For the record (and people following this thread who are too lazy to
open the analyze.c and search for the comments), the problem is that
acquire_sample_rows does not count HEAPTUPLE_INSERT_IN_PROGRESS as live
(and HEAPTUPLE_DELETE_IN_PROGRESS as dead) as it assumes the transaction
will send the stats at the end.

Which is a correct assumption, but it means that when there is a
long-running transaction that inserted/deleted many rows, the reltuples
value will oscillate during VACUUM / ANALYZE runs.

ISTM we need to unify those definitions, probably so that VACUUM adopts
what acquire_sample_rows does. I mean, if ANALYZE assumes that the stats
will be updated at the end of transaction, why shouldn't VACUUM do the
same thing?

The one reason I can think of is that VACUUM is generally expected to
run longer than ANALYZE, so the assumption that large transactions
complete later is not quite reliable.

Or is there some other reason why VACUUM can't treat the in-progress
tuples the same way?

> Another problem is that it looks like CREATE INDEX will set reltuples
> to the total number of heap entries it chose to index, because that
> is what IndexBuildHeapScan counts.  Maybe we should adjust that?
> 

You mean by only counting live tuples in IndexBuildHeapRangeScan,
following whatever definition we end up using in VACUUM/ANALYZE? Seems
like a good idea I guess, although CREATE INDEX not as frequent as the
other commands.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] VACUUM and ANALYZE disagreeing on what reltuples means

Reply via email to