On 14.07.2011 18:57, Pavan Deolasee wrote:
On Thu, Jul 14, 2011 at 11:46 AM, Simon Riggs<si...@2ndquadrant.com> wrote:
I'd say that seems way too complex for such a small use case and we've
only just fixed the bugs from 8.4 vacuum map complexity. The code's
looking very robust now and I'm uneasy that such changes are really
worth it.
Thanks Simon for looking at the patch.
I am not sure if the use case is really narrow. Today, we dirty the pages in
both the passes and also emit WAL records. Just the heap scan can take a
very long time for large tables, blocking the autovacuum worker threads from
doing useful work on other tables. If I am not wrong, we use ring buffers
for vacuum which would most-likely force those buffers to be written/read
twice to the disk.
Seems worthwhile to me. What bothers me a bit is the need for the new
64-bit LSN value on each heap page. Also, note that temporary tables are
not WAL-logged, so there's no LSNs.
How does this interact with the visibility map? If you set the
visibility map bit after vacuuming indexes, a subsequent vacuum will not
visit the page. The second vacuum will update relindxvacxlogid/off, but
it will not clean up the dead line pointers left behind by the first
vacuum. Now the LSN on the page differs from the one stored in pg_class,
so subsequent pruning will not remove the dead line pointers either. I
think you can sidestep that if you check that the page's vacuum LSN <=
vacuum LSN in pg_class, instead of equality.
Ignoring the issue stated in previous paragraph, I think you wouldn't
actually need an 64-bit LSN. A smaller counter is enough, as wrap-around
doesn't matter. In fact, a single bit would be enough. After a
successful vacuum, the counter on each heap page (with dead line
pointers) is N, and the value in pg_class is N. There are no other
values on the heap, because vacuum will have cleaned them up. When you
begin the next vacuum, it will stamp pages with N+1. So at any stage,
there is only one of two values on any page, so a single bit is enough.
(But as I said, that doesn't hold if vacuum skips some pages thanks to
the visibility map)
Is there something in place to make sure that pruning uses an up-to-date
relindxvacxlogid/off value? I guess it doesn't matter if it's
out-of-date, you'll just miss the opportunity to remove some dead tuples.
Seems odd to store relindxvacxlogid/off as two int32 columns. Store it
in one uint64 column, or invent a new datatype for LSNs, or store it as
text in %X/%X format.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers