Re: [HACKERS] COUNT and Performance ...

Tom Lane Sun, 02 Feb 2003 10:05:03 -0800

=?ISO-8859-1?Q?Hans-J=FCrgen_Sch=F6nig?= <[EMAIL PROTECTED]> writes:
> In special cases there can be another way to avoid seq scans:
> [ use pgstattuple() ]


But pgstattuple does do a sequential scan of the table.  You avoid a lot
of the executor's tuple-pushing and plan-node-traversing machinery that
way, but the I/O requirement is going to be exactly the same.

> If people want to count ALL rows of a table. The contrib stuff is pretty 
> useful. It seems to be transaction safe.

Not entirely.  pgstattuple uses HeapTupleSatisfiesNow(), which means you
get a count of tuples that are committed good in terms of the effects of
transactions committed up to the instant each tuple is examined.  This
is in general different from what count(*) would tell you, because it
ignores snapshotting.  It'd be quite unrepeatable too, in the face of
active concurrent changes --- it's very possible for pgstattuple to
count a single row twice or not at all, if it's being concurrently
updated and the other transaction commits between the times pgstattuple
sees the old and new versions of the row.

> The performance boost is great (PostgreSQL 7.3, RedHat, 166Mhz):

I think your test case is small enough that the whole table is resident
in memory, so this measurement only accounts for CPU time per tuple and
not any I/O.  Given the small size of pgstattuple's per-tuple loop, the
speed differential is not too surprising --- but it won't scale up to
larger tables.

Sometime it would be interesting to profile count(*) on large tables
and see exactly where the CPU time goes.  It might be possible to shave
off some of the executor overhead ...

                        regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
    (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [HACKERS] COUNT and Performance ...

Reply via email to