On 10/29/14, 11:49 AM, Jim Nasby wrote:
On 10/21/14, 6:05 PM, Tom Lane wrote:
Jim Nasby <jim.na...@bluetreble.com> writes:
- What happens if we run out of space to remember skipped blocks?
You forget some, and are no worse off than today. (This might be an
event worthy of logging, if the array is large enough that we don't
expect it to happen often ...)
Makes sense. I'll see if there's some reasonable way to retry pages when the
array fills up.
I'll make the array 2k in size; that allows for 512 pages without spending a
bunch of memory.
Attached is a patch for this. It also adds logging of unobtainable cleanup
locks, and refactors scanning a page for vacuum into it's own function.
Anyone reviewing this might want to look at
https://github.com/decibel/postgres/commit/69ab22f703d577cbb3d8036e4e42563977bcf74b,
which is the refactor with no whitespace changes.
I've verified this works correctly by connecting to a backend with gdb and
halting it with a page pinned. Both vacuum and vacuum freeze on that table do
what's expected, but I also get this waring (which AFAICT is a false positive):
decibel@decina.local=# vacuum verbose i;
INFO: vacuuming "public.i"
INFO: "i": found 0 removable, 399774 nonremovable row versions in 1769 out of
1770 pages
DETAIL: 200000 dead row versions cannot be removed yet.
There were 0 unused item pointers.
0 pages are entirely empty.
Retried cleanup lock on 0 pages, retry failed on 1, skipped retry on 0.
CPU 0.00s/0.06u sec elapsed 12.89 sec.
WARNING: buffer refcount leak: [105] (rel=base/16384/16385, blockNum=0,
flags=0x106, refcount=2 1)
VACUUM
I am doing a simple static allocation of retry_pages[]; my understanding is
that will only exist for the duration of this function so it's OK. If not I'll
palloc it. If it is OK then I'll do the same for the freeze array.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com
>From 1752751903a8d51b7b3b618072b6b0687f9f141c Mon Sep 17 00:00:00 2001
From: Jim Nasby <jim.na...@bluetreble.com>
Date: Thu, 6 Nov 2014 14:42:52 -0600
Subject: [PATCH] Vacuum cleanup lock retry
This patch will retry failed attempts to obtain the cleanup lock on a
buffer. It remembers failed block numbers in an array and retries after
vacuuming the relation. The array is currently fixed at 512 entries;
additional lock failures will not be re-attempted.
This patch also adds counters to report on failures, as well as
refactoring the guts of page vacuum scans into it's own function.
---
src/backend/commands/vacuumlazy.c | 964 +++++++++++++++++++++-----------------
1 file changed, 541 insertions(+), 423 deletions(-)
diff --git a/src/backend/commands/vacuumlazy.c
b/src/backend/commands/vacuumlazy.c
index 3778d9d..240113f 100644
--- a/src/backend/commands/vacuumlazy.c
+++ b/src/backend/commands/vacuumlazy.c
@@ -96,6 +96,14 @@
*/
#define SKIP_PAGES_THRESHOLD ((BlockNumber) 32)
+/*
+ * Instead of blindly skipping pages that we can't immediately acquire a
+ * cleanup lock for (assuming we're not freezing), we keep a list of pages we
+ * initially skipped, up to VACUUM_MAX_RETRY_PAGES. We retry those pages at the
+ * end of vacuuming.
+ */
+#define VACUUM_MAX_RETRY_PAGES 512
+
typedef struct LVRelStats
{
/* hasindex = true means two-pass strategy; false means one-pass */
@@ -143,6 +151,10 @@ static void lazy_vacuum_index(Relation indrel,
static void lazy_cleanup_index(Relation indrel,
IndexBulkDeleteResult *stats,
LVRelStats *vacrelstats);
+static void lazy_scan_page(Relation onerel, LVRelStats *vacrelstats,
+ BlockNumber blkno, Buffer buf, Buffer vmbuffer,
xl_heap_freeze_tuple *frozen,
+ int nindexes, bool all_visible_according_to_vm,
+ BlockNumber *empty_pages, BlockNumber
*vacuumed_pages, double *nunused);
static int lazy_vacuum_page(Relation onerel, BlockNumber blkno, Buffer buffer,
int tupindex, LVRelStats *vacrelstats, Buffer
*vmbuffer);
static void lazy_truncate_heap(Relation onerel, LVRelStats *vacrelstats);
@@ -422,13 +434,15 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
{
BlockNumber nblocks,
blkno;
- HeapTupleData tuple;
char *relname;
BlockNumber empty_pages,
- vacuumed_pages;
- double num_tuples,
- tups_vacuumed,
- nkeep,
+ vacuumed_pages,
+ retry_pages[VACUUM_MAX_RETRY_PAGES];
+ int retry_pages_insert_ptr;
+ double retry_page_count,
+ retry_fail_count,
+ retry_pages_skipped,
+ cleanup_lock_waits,
nunused;
IndexBulkDeleteResult **indstats;
int i;
@@ -446,8 +460,8 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
get_namespace_name(RelationGetNamespace(onerel)),
relname)));
- empty_pages = vacuumed_pages = 0;
- num_tuples = tups_vacuumed = nkeep = nunused = 0;
+ empty_pages = vacuumed_pages = retry_pages_insert_ptr =
retry_page_count =
+ retry_fail_count = retry_pages_skipped = cleanup_lock_waits =
nunused = 0;
indstats = (IndexBulkDeleteResult **)
palloc0(nindexes * sizeof(IndexBulkDeleteResult *));
@@ -508,18 +522,7 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
for (blkno = 0; blkno < nblocks; blkno++)
{
Buffer buf;
- Page page;
- OffsetNumber offnum,
- maxoff;
- bool tupgone,
- hastup;
- int prev_dead_count;
- int nfrozen;
- Size freespace;
bool all_visible_according_to_vm;
- bool all_visible;
- bool has_dead_tuples;
- TransactionId visibility_cutoff_xid = InvalidTransactionId;
if (blkno == next_not_all_visible_block)
{
@@ -617,6 +620,19 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
*/
if (!scan_all)
{
+ /*
+ * Remember the page that we're skipping, but
only if there's
+ * still room.
+ *
+ * XXX it would be even better if we retried as
soon as we
+ * filled retry_pages, but we should get very
few retry pages
+ * anyway so lets not go overboard.
+ */
+ if
(retry_pages_insert_ptr<VACUUM_MAX_RETRY_PAGES)
+ retry_pages[retry_pages_insert_ptr++] =
blkno;
+ else
+ retry_pages_skipped++;
+
ReleaseBuffer(buf);
continue;
}
@@ -641,420 +657,55 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
}
LockBuffer(buf, BUFFER_LOCK_UNLOCK);
LockBufferForCleanup(buf);
+ cleanup_lock_waits++;
/* drop through to normal processing */
}
- vacrelstats->scanned_pages++;
-
- page = BufferGetPage(buf);
-
- if (PageIsNew(page))
- {
- /*
- * An all-zeroes page could be left over if a backend
extends the
- * relation but crashes before initializing the page.
Reclaim such
- * pages for use.
- *
- * We have to be careful here because we could be
looking at a
- * page that someone has just added to the relation and
not yet
- * been able to initialize (see
RelationGetBufferForTuple). To
- * protect against that, release the buffer lock, grab
the
- * relation extension lock momentarily, and re-lock the
buffer. If
- * the page is still uninitialized by then, it must be
left over
- * from a crashed backend, and we can initialize it.
- *
- * We don't really need the relation lock when this is
a new or
- * temp relation, but it's probably not worth the code
space to
- * check that, since this surely isn't a critical path.
- *
- * Note: the comparable code in vacuum.c need not worry
because
- * it's got exclusive lock on the whole relation.
- */
- LockBuffer(buf, BUFFER_LOCK_UNLOCK);
- LockRelationForExtension(onerel, ExclusiveLock);
- UnlockRelationForExtension(onerel, ExclusiveLock);
- LockBufferForCleanup(buf);
- if (PageIsNew(page))
- {
- ereport(WARNING,
- (errmsg("relation \"%s\" page %u is
uninitialized --- fixing",
- relname, blkno)));
- PageInit(page, BufferGetPageSize(buf), 0);
- empty_pages++;
- }
- freespace = PageGetHeapFreeSpace(page);
- MarkBufferDirty(buf);
- UnlockReleaseBuffer(buf);
-
- RecordPageWithFreeSpace(onerel, blkno, freespace);
- continue;
- }
-
- if (PageIsEmpty(page))
- {
- empty_pages++;
- freespace = PageGetHeapFreeSpace(page);
-
- /* empty pages are always all-visible */
- if (!PageIsAllVisible(page))
- {
- START_CRIT_SECTION();
-
- /* mark buffer dirty before writing a WAL
record */
- MarkBufferDirty(buf);
-
- /*
- * It's possible that another backend has
extended the heap,
- * initialized the page, and then failed to
WAL-log the page
- * due to an ERROR. Since heap extension is
not WAL-logged,
- * recovery might try to replay our record
setting the page
- * all-visible and find that the page isn't
initialized, which
- * will cause a PANIC. To prevent that, check
whether the
- * page has been previously WAL-logged, and if
not, do that
- * now.
- */
- if (RelationNeedsWAL(onerel) &&
- PageGetLSN(page) == InvalidXLogRecPtr)
- log_newpage_buffer(buf, true);
-
- PageSetAllVisible(page);
- visibilitymap_set(onerel, blkno, buf,
InvalidXLogRecPtr,
- vmbuffer,
InvalidTransactionId);
- END_CRIT_SECTION();
- }
-
- UnlockReleaseBuffer(buf);
- RecordPageWithFreeSpace(onerel, blkno, freespace);
- continue;
- }
-
- /*
- * Prune all HOT-update chains in this page.
- *
- * We count tuples removed by the pruning step as removed by
VACUUM.
- */
- tups_vacuumed += heap_page_prune(onerel, buf, OldestXmin, false,
-
&vacrelstats->latestRemovedXid);
+ lazy_scan_page(onerel, vacrelstats, blkno, buf, vmbuffer,
frozen,
+ nindexes,
all_visible_according_to_vm,
+ &empty_pages, &vacuumed_pages,
&nunused);
+ }
- /*
- * Now scan the page to collect vacuumable items and check for
tuples
- * requiring freezing.
- */
- all_visible = true;
- has_dead_tuples = false;
- nfrozen = 0;
- hastup = false;
- prev_dead_count = vacrelstats->num_dead_tuples;
- maxoff = PageGetMaxOffsetNumber(page);
+ /*
+ * Make a second attempt to acquire the cleanup lock on pages we
skipped.
+ * Note that we don't have to worry about !scan_all here.
+ */
- /*
- * Note: If you change anything in the loop below, also look at
- * heap_page_is_all_visible to see if that needs to be changed.
- */
- for (offnum = FirstOffsetNumber;
- offnum <= maxoff;
- offnum = OffsetNumberNext(offnum))
+ if (retry_pages_insert_ptr)
+ {
+ for (i = 0; i < retry_pages_insert_ptr; i++)
{
- ItemId itemid;
-
- itemid = PageGetItemId(page, offnum);
-
- /* Unused items require no processing, but we count 'em
*/
- if (!ItemIdIsUsed(itemid))
- {
- nunused += 1;
- continue;
- }
-
- /* Redirect items mustn't be touched */
- if (ItemIdIsRedirected(itemid))
- {
- hastup = true; /* this page won't be
truncatable */
- continue;
- }
-
- ItemPointerSet(&(tuple.t_self), blkno, offnum);
-
- /*
- * DEAD item pointers are to be vacuumed normally; but
we don't
- * count them in tups_vacuumed, else we'd be
double-counting (at
- * least in the common case where heap_page_prune()
just freed up
- * a non-HOT tuple).
- */
- if (ItemIdIsDead(itemid))
- {
- lazy_record_dead_tuple(vacrelstats,
&(tuple.t_self));
- all_visible = false;
- continue;
- }
-
- Assert(ItemIdIsNormal(itemid));
-
- tuple.t_data = (HeapTupleHeader) PageGetItem(page,
itemid);
- tuple.t_len = ItemIdGetLength(itemid);
- tuple.t_tableOid = RelationGetRelid(onerel);
-
- tupgone = false;
-
- switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin,
buf))
- {
- case HEAPTUPLE_DEAD:
-
- /*
- * Ordinarily, DEAD tuples would have
been removed by
- * heap_page_prune(), but it's possible
that the tuple
- * state changed since
heap_page_prune() looked. In
- * particular an INSERT_IN_PROGRESS
tuple could have
- * changed to DEAD if the inserter
aborted. So this
- * cannot be considered an error
condition.
- *
- * If the tuple is HOT-updated then it
must only be
- * removed by a prune operation; so we
keep it just as if
- * it were RECENTLY_DEAD. Also, if
it's a heap-only
- * tuple, we choose to keep it, because
it'll be a lot
- * cheaper to get rid of it in the next
pruning pass than
- * to treat it like an indexed tuple.
- */
- if (HeapTupleIsHotUpdated(&tuple) ||
- HeapTupleIsHeapOnly(&tuple))
- nkeep += 1;
- else
- tupgone = true; /* we can
delete the tuple */
- all_visible = false;
- break;
- case HEAPTUPLE_LIVE:
- /* Tuple is good --- but let's do some
validity checks */
- if (onerel->rd_rel->relhasoids &&
-
!OidIsValid(HeapTupleGetOid(&tuple)))
- elog(WARNING, "relation \"%s\"
TID %u/%u: OID is invalid",
- relname, blkno,
offnum);
-
- /*
- * Is the tuple definitely visible to
all transactions?
- *
- * NB: Like with per-tuple hint bits,
we can't set the
- * PD_ALL_VISIBLE flag if the inserter
committed
- * asynchronously. See SetHintBits for
more info. Check
- * that the tuple is hinted
xmin-committed because of
- * that.
- */
- if (all_visible)
- {
- TransactionId xmin;
-
- if
(!HeapTupleHeaderXminCommitted(tuple.t_data))
- {
- all_visible = false;
- break;
- }
-
- /*
- * The inserter definitely
committed. But is it old
- * enough that everyone sees it
as committed?
- */
- xmin =
HeapTupleHeaderGetXmin(tuple.t_data);
- if
(!TransactionIdPrecedes(xmin, OldestXmin))
- {
- all_visible = false;
- break;
- }
-
- /* Track newest xmin on page. */
- if (TransactionIdFollows(xmin,
visibility_cutoff_xid))
- visibility_cutoff_xid =
xmin;
- }
- break;
- case HEAPTUPLE_RECENTLY_DEAD:
+ Buffer buf;
+ blkno = retry_pages[i];
- /*
- * If tuple is recently deleted then we
must not remove it
- * from relation.
- */
- nkeep += 1;
- all_visible = false;
- break;
- case HEAPTUPLE_INSERT_IN_PROGRESS:
- /* This is an expected case during
concurrent vacuum */
- all_visible = false;
- break;
- case HEAPTUPLE_DELETE_IN_PROGRESS:
- /* This is an expected case during
concurrent vacuum */
- all_visible = false;
- break;
- default:
- elog(ERROR, "unexpected
HeapTupleSatisfiesVacuum result");
- break;
- }
-
- if (tupgone)
- {
- lazy_record_dead_tuple(vacrelstats,
&(tuple.t_self));
-
HeapTupleHeaderAdvanceLatestRemovedXid(tuple.t_data,
-
&vacrelstats->latestRemovedXid);
- tups_vacuumed += 1;
- has_dead_tuples = true;
- }
- else
- {
- num_tuples += 1;
- hastup = true;
-
- /*
- * Each non-removable tuple must be checked to
see if it needs
- * freezing. Note we already have exclusive
buffer lock.
- */
- if (heap_prepare_freeze_tuple(tuple.t_data,
FreezeLimit,
-
MultiXactCutoff, &frozen[nfrozen]))
- frozen[nfrozen++].offset = offnum;
- }
- } /* scan along
page */
+ visibilitymap_pin(onerel, blkno, &vmbuffer);
- /*
- * If we froze any tuples, mark the buffer dirty, and write a
WAL
- * record recording the changes. We must log the changes to be
- * crash-safe against future truncation of CLOG.
- */
- if (nfrozen > 0)
- {
- START_CRIT_SECTION();
-
- MarkBufferDirty(buf);
+ buf = ReadBufferExtended(onerel, MAIN_FORKNUM, blkno,
+
RBM_NORMAL, vac_strategy);
- /* execute collected freezes */
- for (i = 0; i < nfrozen; i++)
+ /* We need buffer cleanup lock so that we can prune HOT
chains. */
+ if (ConditionalLockBufferForCleanup(buf))
{
- ItemId itemid;
- HeapTupleHeader htup;
+ retry_page_count++;
- itemid = PageGetItemId(page, frozen[i].offset);
- htup = (HeapTupleHeader) PageGetItem(page,
itemid);
-
- heap_execute_freeze_tuple(htup, &frozen[i]);
- }
-
- /* Now WAL-log freezing if neccessary */
- if (RelationNeedsWAL(onerel))
+ lazy_scan_page(onerel, vacrelstats, blkno, buf,
vmbuffer, frozen,
+ nindexes,
visibilitymap_test(onerel, blkno, &vmbuffer),
+ &empty_pages,
&vacuumed_pages, &nunused);
+ } else
{
- XLogRecPtr recptr;
-
- recptr = log_heap_freeze(onerel, buf,
FreezeLimit,
-
frozen, nfrozen);
- PageSetLSN(page, recptr);
+ retry_fail_count++;
}
-
- END_CRIT_SECTION();
- }
-
- /*
- * If there are no indexes then we can vacuum the page right now
- * instead of doing a second scan.
- */
- if (nindexes == 0 &&
- vacrelstats->num_dead_tuples > 0)
- {
- /* Remove tuples from heap */
- lazy_vacuum_page(onerel, blkno, buf, 0, vacrelstats,
&vmbuffer);
- has_dead_tuples = false;
-
- /*
- * Forget the now-vacuumed tuples, and press on, but be
careful
- * not to reset latestRemovedXid since we want that
value to be
- * valid.
- */
- vacrelstats->num_dead_tuples = 0;
- vacuumed_pages++;
- }
-
- freespace = PageGetHeapFreeSpace(page);
-
- /* mark page all-visible, if appropriate */
- if (all_visible && !all_visible_according_to_vm)
- {
- /*
- * It should never be the case that the visibility map
page is set
- * while the page-level bit is clear, but the reverse
is allowed
- * (if checksums are not enabled). Regardless, set the
both bits
- * so that we get back in sync.
- *
- * NB: If the heap page is all-visible but the VM bit
is not set,
- * we don't need to dirty the heap page. However, if
checksums
- * are enabled, we do need to make sure that the heap
page is
- * dirtied before passing it to visibilitymap_set(),
because it
- * may be logged. Given that this situation should
only happen in
- * rare cases after a crash, it is not worth optimizing.
- */
- PageSetAllVisible(page);
- MarkBufferDirty(buf);
- visibilitymap_set(onerel, blkno, buf, InvalidXLogRecPtr,
- vmbuffer,
visibility_cutoff_xid);
- }
-
- /*
- * As of PostgreSQL 9.2, the visibility map bit should never be
set if
- * the page-level bit is clear. However, it's possible that
the bit
- * got cleared after we checked it and before we took the buffer
- * content lock, so we must recheck before jumping to the
conclusion
- * that something bad has happened.
- */
- else if (all_visible_according_to_vm && !PageIsAllVisible(page)
- && visibilitymap_test(onerel, blkno,
&vmbuffer))
- {
- elog(WARNING, "page is not marked all-visible but
visibility map bit is set in relation \"%s\" page %u",
- relname, blkno);
- visibilitymap_clear(onerel, blkno, vmbuffer);
}
-
- /*
- * It's possible for the value returned by GetOldestXmin() to
move
- * backwards, so it's not wrong for us to see tuples that
appear to
- * not be visible to everyone yet, while PD_ALL_VISIBLE is
already
- * set. The real safe xmin value never moves backwards, but
- * GetOldestXmin() is conservative and sometimes returns a value
- * that's unnecessarily small, so if we see that contradiction
it just
- * means that the tuples that we think are not visible to
everyone yet
- * actually are, and the PD_ALL_VISIBLE flag is correct.
- *
- * There should never be dead tuples on a page with
PD_ALL_VISIBLE
- * set, however.
- */
- else if (PageIsAllVisible(page) && has_dead_tuples)
- {
- elog(WARNING, "page containing dead tuples is marked as
all-visible in relation \"%s\" page %u",
- relname, blkno);
- PageClearAllVisible(page);
- MarkBufferDirty(buf);
- visibilitymap_clear(onerel, blkno, vmbuffer);
- }
-
- UnlockReleaseBuffer(buf);
-
- /* Remember the location of the last page with nonremovable
tuples */
- if (hastup)
- vacrelstats->nonempty_pages = blkno + 1;
-
- /*
- * If we remembered any tuples for deletion, then the page will
be
- * visited again by lazy_vacuum_heap, which will compute and
record
- * its post-compaction free space. If not, then we're done
with this
- * page, so remember its free space as-is. (This path will
always be
- * taken if there are no indexes.)
- */
- if (vacrelstats->num_dead_tuples == prev_dead_count)
- RecordPageWithFreeSpace(onerel, blkno, freespace);
}
pfree(frozen);
- /* save stats for use later */
- vacrelstats->scanned_tuples = num_tuples;
- vacrelstats->tuples_deleted = tups_vacuumed;
- vacrelstats->new_dead_tuples = nkeep;
/* now we can compute the new value for pg_class.reltuples */
vacrelstats->new_rel_tuples = vac_estimate_reltuples(onerel, false,
nblocks,
vacrelstats->scanned_pages,
-
num_tuples);
+
vacrelstats->scanned_tuples);
/*
* Release any remaining pin on visibility map page.
@@ -1077,6 +728,7 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
lazy_vacuum_index(Irel[i],
&indstats[i],
vacrelstats);
+
/* Remove tuples from heap */
lazy_vacuum_heap(onerel, vacrelstats);
vacrelstats->num_index_scans++;
@@ -1091,21 +743,57 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
ereport(elevel,
(errmsg("\"%s\": removed %.0f row versions in
%u pages",
RelationGetRelationName(onerel),
- tups_vacuumed,
vacuumed_pages)));
+ vacrelstats->tuples_deleted,
vacuumed_pages)));
- ereport(elevel,
- (errmsg("\"%s\": found %.0f removable, %.0f
nonremovable row versions in %u out of %u pages",
- RelationGetRelationName(onerel),
- tups_vacuumed, num_tuples,
- vacrelstats->scanned_pages, nblocks),
- errdetail("%.0f dead row versions cannot be removed
yet.\n"
- "There were %.0f unused item
pointers.\n"
- "%u pages are entirely empty.\n"
- "%s.",
- nkeep,
- nunused,
- empty_pages,
- pg_rusage_show(&ru0))));
+ if (retry_page_count || retry_fail_count || retry_pages_skipped)
+ ereport(elevel,
+ (errmsg("\"%s\": found %.0f removable, %.0f
nonremovable row versions in %u out of %u pages",
+ RelationGetRelationName(onerel),
+ vacrelstats->tuples_deleted,
vacrelstats->scanned_tuples,
+ vacrelstats->scanned_pages,
nblocks),
+ errdetail("%.0f dead row versions cannot be
removed yet.\n"
+ "There were %.0f unused item
pointers.\n"
+ "%u pages are entirely
empty.\n"
+ "Retried cleanup lock on
%.0f pages, retry failed on %.0f, skipped retry on %.0f.\n"
+ "%s.",
+ vacrelstats->new_dead_tuples,
+ nunused,
+ empty_pages,
+ retry_page_count,
retry_fail_count, retry_pages_skipped,
+ pg_rusage_show(&ru0))
+ ));
+ else if (cleanup_lock_waits)
+ ereport(elevel,
+ (errmsg("\"%s\": found %.0f removable, %.0f
nonremovable row versions in %u out of %u pages",
+ RelationGetRelationName(onerel),
+ vacrelstats->tuples_deleted,
vacrelstats->scanned_tuples,
+ vacrelstats->scanned_pages,
nblocks),
+ errdetail("%.0f dead row versions cannot be
removed yet.\n"
+ "There were %.0f unused item
pointers.\n"
+ "%u pages are entirely
empty.\n"
+ "Waited for cleanup lock on
%.0f pages.\n"
+ "%s.",
+ vacrelstats->new_dead_tuples,
+ nunused,
+ empty_pages,
+ cleanup_lock_waits,
+ pg_rusage_show(&ru0))
+ ));
+ else
+ ereport(elevel,
+ (errmsg("\"%s\": found %.0f removable, %.0f
nonremovable row versions in %u out of %u pages",
+ RelationGetRelationName(onerel),
+ vacrelstats->tuples_deleted,
vacrelstats->scanned_tuples,
+ vacrelstats->scanned_pages,
nblocks),
+ errdetail("%.0f dead row versions cannot be
removed yet.\n"
+ "There were %.0f unused item
pointers.\n"
+ "%u pages are entirely
empty.\n"
+ "%s.",
+ vacrelstats->new_dead_tuples,
+ nunused,
+ empty_pages,
+ pg_rusage_show(&ru0))
+ ));
}
@@ -1175,6 +863,436 @@ lazy_vacuum_heap(Relation onerel, LVRelStats
*vacrelstats)
errdetail("%s.",
pg_rusage_show(&ru0))));
}
+/*
+ * lazy_scan_page() - scan a single page for dead tuples
+ *
+ * This is broken out from lazy_scan_heap() so that we can retry cleaning pages
+ * that we couldn't get the cleanup lock on. Caller must have a cleanup lock on
+ * the heap buffer (buf), and have the appropriate visibility map buffer
+ * (vmbuffer) pinned.
+ *
+ */
+static void
+lazy_scan_page(Relation onerel, LVRelStats *vacrelstats,
+ BlockNumber blkno, Buffer buf, Buffer vmbuffer,
xl_heap_freeze_tuple *frozen,
+ int nindexes, bool all_visible_according_to_vm,
+ BlockNumber *empty_pages, BlockNumber
*vacuumed_pages, double *nunused)
+{
+ int nfrozen = 0;
+ int i;
+ Page page;
+ OffsetNumber offnum,
+ maxoff;
+ HeapTupleData tuple;
+ bool all_visible = true;
+ bool has_dead_tuples = false;
+ bool hastup = false;
+ bool tupgone;
+ char *relname = RelationGetRelationName(onerel);
+ Size freespace;
+ TransactionId visibility_cutoff_xid = InvalidTransactionId;
+ int prev_dead_count =
vacrelstats->num_dead_tuples;
+
+ /*
+ * I don't see a way to check onerel against buf or vmbuffer without
+ * BufferGetTag, which seems like overkill.
+ */
+ Assert(BufferGetBlockNumber(buf) == blkno);
+ Assert(visibilitymap_pin_ok(blkno, vmbuffer));
+
+ vacrelstats->scanned_pages++;
+
+ page = BufferGetPage(buf);
+
+ if (PageIsNew(page))
+ {
+ /*
+ * An all-zeroes page could be left over if a backend extends
the
+ * relation but crashes before initializing the page. Reclaim
such
+ * pages for use.
+ *
+ * We have to be careful here because we could be looking at a
+ * page that someone has just added to the relation and not yet
+ * been able to initialize (see RelationGetBufferForTuple). To
+ * protect against that, release the buffer lock, grab the
+ * relation extension lock momentarily, and re-lock the buffer.
If
+ * the page is still uninitialized by then, it must be left over
+ * from a crashed backend, and we can initialize it.
+ *
+ * We don't really need the relation lock when this is a new or
+ * temp relation, but it's probably not worth the code space to
+ * check that, since this surely isn't a critical path.
+ *
+ * Note: the comparable code in vacuum.c need not worry because
+ * it's got exclusive lock on the whole relation.
+ */
+ LockBuffer(buf, BUFFER_LOCK_UNLOCK);
+ LockRelationForExtension(onerel, ExclusiveLock);
+ UnlockRelationForExtension(onerel, ExclusiveLock);
+ LockBufferForCleanup(buf);
+ if (PageIsNew(page))
+ {
+ ereport(WARNING,
+ (errmsg("relation \"%s\" page %u is uninitialized ---
fixing",
+ relname, blkno)));
+ PageInit(page, BufferGetPageSize(buf), 0);
+ empty_pages++;
+ }
+ freespace = PageGetHeapFreeSpace(page);
+ MarkBufferDirty(buf);
+ UnlockReleaseBuffer(buf);
+
+ RecordPageWithFreeSpace(onerel, blkno, freespace);
+ return;
+ }
+
+ if (PageIsEmpty(page))
+ {
+ empty_pages++;
+ freespace = PageGetHeapFreeSpace(page);
+
+ /* empty pages are always all-visible */
+ if (!PageIsAllVisible(page))
+ {
+ START_CRIT_SECTION();
+
+ /* mark buffer dirty before writing a WAL record */
+ MarkBufferDirty(buf);
+
+ /*
+ * It's possible that another backend has extended the
heap,
+ * initialized the page, and then failed to WAL-log the
page
+ * due to an ERROR. Since heap extension is not
WAL-logged,
+ * recovery might try to replay our record setting the
page
+ * all-visible and find that the page isn't
initialized, which
+ * will cause a PANIC. To prevent that, check whether
the
+ * page has been previously WAL-logged, and if not, do
that
+ * now.
+ */
+ if (RelationNeedsWAL(onerel) &&
+ PageGetLSN(page) == InvalidXLogRecPtr)
+ log_newpage_buffer(buf, true);
+
+ PageSetAllVisible(page);
+ visibilitymap_set(onerel, blkno, buf, InvalidXLogRecPtr,
+ vmbuffer,
InvalidTransactionId);
+ END_CRIT_SECTION();
+ }
+
+ UnlockReleaseBuffer(buf);
+ RecordPageWithFreeSpace(onerel, blkno, freespace);
+ return;
+ }
+
+ /*
+ * Prune all HOT-update chains in this page.
+ *
+ * We count tuples removed by the pruning step as removed by VACUUM.
+ */
+ vacrelstats->tuples_deleted += heap_page_prune(onerel, buf, OldestXmin,
false,
+
&vacrelstats->latestRemovedXid);
+
+ /*
+ * Now scan the page to collect vacuumable items and check for tuples
+ * requiring freezing.
+ */
+
+ /*
+ * Note: If you change anything in the loop below, also look at
+ * heap_page_is_all_visible to see if that needs to be changed.
+ */
+ maxoff = PageGetMaxOffsetNumber(page);
+ for (offnum = FirstOffsetNumber;
+ offnum <= maxoff;
+ offnum = OffsetNumberNext(offnum))
+ {
+ ItemId itemid;
+
+ itemid = PageGetItemId(page, offnum);
+
+ /* Unused items require no processing, but we count 'em */
+ if (!ItemIdIsUsed(itemid))
+ {
+ nunused += 1;
+ continue;
+ }
+
+ /* Redirect items mustn't be touched */
+ if (ItemIdIsRedirected(itemid))
+ {
+ hastup = true; /* this page won't be truncatable */
+ continue;
+ }
+
+ ItemPointerSet(&(tuple.t_self), blkno, offnum);
+
+ /*
+ * DEAD item pointers are to be vacuumed normally; but we don't
+ * count them in vacrelstats->tuples_deleted, else we'd be
double-counting (at
+ * least in the common case where heap_page_prune() just freed
up
+ * a non-HOT tuple).
+ */
+ if (ItemIdIsDead(itemid))
+ {
+ lazy_record_dead_tuple(vacrelstats, &(tuple.t_self));
+ all_visible = false;
+ continue;
+ }
+
+ Assert(ItemIdIsNormal(itemid));
+
+ tuple.t_data = (HeapTupleHeader) PageGetItem(page, itemid);
+ tuple.t_len = ItemIdGetLength(itemid);
+ tuple.t_tableOid = RelationGetRelid(onerel);
+
+ tupgone = false;
+
+ switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin, buf))
+ {
+ case HEAPTUPLE_DEAD:
+
+ /*
+ * Ordinarily, DEAD tuples would have been
removed by
+ * heap_page_prune(), but it's possible that
the tuple
+ * state changed since heap_page_prune()
looked. In
+ * particular an INSERT_IN_PROGRESS tuple could
have
+ * changed to DEAD if the inserter aborted. So
this
+ * cannot be considered an error condition.
+ *
+ * If the tuple is HOT-updated then it must
only be
+ * removed by a prune operation; so we keep it
just as if
+ * it were RECENTLY_DEAD. Also, if it's a
heap-only
+ * tuple, we choose to keep it, because it'll
be a lot
+ * cheaper to get rid of it in the next pruning
pass than
+ * to treat it like an indexed tuple.
+ */
+ if (HeapTupleIsHotUpdated(&tuple) ||
+ HeapTupleIsHeapOnly(&tuple))
+ vacrelstats->new_dead_tuples += 1;
+ else
+ tupgone = true; /* we can delete the
tuple */
+ all_visible = false;
+ break;
+ case HEAPTUPLE_LIVE:
+ /* Tuple is good --- but let's do some validity
checks */
+ if (onerel->rd_rel->relhasoids &&
+ !OidIsValid(HeapTupleGetOid(&tuple)))
+ elog(WARNING, "relation \"%s\" TID
%u/%u: OID is invalid",
+ relname, blkno, offnum);
+
+ /*
+ * Is the tuple definitely visible to all
transactions?
+ *
+ * NB: Like with per-tuple hint bits, we can't
set the
+ * PD_ALL_VISIBLE flag if the inserter committed
+ * asynchronously. See SetHintBits for more
info. Check
+ * that the tuple is hinted xmin-committed
because of
+ * that.
+ */
+ if (all_visible)
+ {
+ TransactionId xmin;
+
+ if
(!HeapTupleHeaderXminCommitted(tuple.t_data))
+ {
+ all_visible = false;
+ break;
+ }
+
+ /*
+ * The inserter definitely committed.
But is it old
+ * enough that everyone sees it as
committed?
+ */
+ xmin =
HeapTupleHeaderGetXmin(tuple.t_data);
+ if (!TransactionIdPrecedes(xmin,
OldestXmin))
+ {
+ all_visible = false;
+ break;
+ }
+
+ /* Track newest xmin on page. */
+ if (TransactionIdFollows(xmin,
visibility_cutoff_xid))
+ visibility_cutoff_xid = xmin;
+ }
+ break;
+ case HEAPTUPLE_RECENTLY_DEAD:
+
+ /*
+ * If tuple is recently deleted then we must
not remove it
+ * from relation.
+ */
+ vacrelstats->new_dead_tuples += 1;
+ all_visible = false;
+ break;
+ case HEAPTUPLE_INSERT_IN_PROGRESS:
+ /* This is an expected case during concurrent
vacuum */
+ all_visible = false;
+ break;
+ case HEAPTUPLE_DELETE_IN_PROGRESS:
+ /* This is an expected case during concurrent
vacuum */
+ all_visible = false;
+ break;
+ default:
+ elog(ERROR, "unexpected
HeapTupleSatisfiesVacuum result");
+ break;
+ }
+
+ if (tupgone)
+ {
+ lazy_record_dead_tuple(vacrelstats, &(tuple.t_self));
+ HeapTupleHeaderAdvanceLatestRemovedXid(tuple.t_data,
+
&vacrelstats->latestRemovedXid);
+ vacrelstats->tuples_deleted += 1;
+ has_dead_tuples = true;
+ }
+ else
+ {
+ vacrelstats->scanned_tuples += 1;
+ hastup = true;
+
+ /*
+ * Each non-removable tuple must be checked to see if
it needs
+ * freezing. Note we already have exclusive buffer
lock.
+ */
+ if (heap_prepare_freeze_tuple(tuple.t_data, FreezeLimit,
+
MultiXactCutoff, &frozen[nfrozen]))
+ frozen[nfrozen++].offset = offnum;
+ }
+ } /* scan along page */
+
+ /*
+ * If we froze any tuples, mark the buffer dirty, and write a WAL
+ * record recording the changes. We must log the changes to be
+ * crash-safe against future truncation of CLOG.
+ */
+ if (nfrozen > 0)
+ {
+ START_CRIT_SECTION();
+
+ MarkBufferDirty(buf);
+
+ /* execute collected freezes */
+ for (i = 0; i < nfrozen; i++)
+ {
+ ItemId itemid;
+ HeapTupleHeader htup;
+
+ itemid = PageGetItemId(page, frozen[i].offset);
+ htup = (HeapTupleHeader) PageGetItem(page, itemid);
+
+ heap_execute_freeze_tuple(htup, &frozen[i]);
+ }
+
+ /* Now WAL-log freezing if neccessary */
+ if (RelationNeedsWAL(onerel))
+ {
+ XLogRecPtr recptr;
+
+ recptr = log_heap_freeze(onerel, buf, FreezeLimit,
+
frozen, nfrozen);
+ PageSetLSN(page, recptr);
+ }
+
+ END_CRIT_SECTION();
+ }
+
+ /*
+ * If there are no indexes then we can vacuum the page right now
+ * instead of doing a second scan.
+ */
+ if (nindexes == 0 &&
+ vacrelstats->num_dead_tuples > 0)
+ {
+ /* Remove tuples from heap */
+ lazy_vacuum_page(onerel, blkno, buf, 0, vacrelstats, &vmbuffer);
+ has_dead_tuples = false;
+
+ /*
+ * Forget the now-vacuumed tuples, and press on, but be careful
+ * not to reset latestRemovedXid since we want that value to be
+ * valid.
+ */
+ vacrelstats->num_dead_tuples = 0;
+ vacuumed_pages++;
+ }
+
+ freespace = PageGetHeapFreeSpace(page);
+
+ /* mark page all-visible, if appropriate */
+ if (all_visible && !all_visible_according_to_vm)
+ {
+ /*
+ * It should never be the case that the visibility map page is
set
+ * while the page-level bit is clear, but the reverse is allowed
+ * (if checksums are not enabled). Regardless, set the both
bits
+ * so that we get back in sync.
+ *
+ * NB: If the heap page is all-visible but the VM bit is not
set,
+ * we don't need to dirty the heap page. However, if checksums
+ * are enabled, we do need to make sure that the heap page is
+ * dirtied before passing it to visibilitymap_set(), because it
+ * may be logged. Given that this situation should only happen
in
+ * rare cases after a crash, it is not worth optimizing.
+ */
+ PageSetAllVisible(page);
+ MarkBufferDirty(buf);
+ visibilitymap_set(onerel, blkno, buf, InvalidXLogRecPtr,
+ vmbuffer,
visibility_cutoff_xid);
+ }
+
+ /*
+ * As of PostgreSQL 9.2, the visibility map bit should never be set if
+ * the page-level bit is clear. However, it's possible that the bit
+ * got cleared after we checked it and before we took the buffer
+ * content lock, so we must recheck before jumping to the conclusion
+ * that something bad has happened.
+ */
+ else if (all_visible_according_to_vm && !PageIsAllVisible(page)
+ && visibilitymap_test(onerel, blkno, &vmbuffer))
+ {
+ elog(WARNING, "page is not marked all-visible but visibility
map bit is set in relation \"%s\" page %u",
+ relname, blkno);
+ visibilitymap_clear(onerel, blkno, vmbuffer);
+ }
+
+ /*
+ * It's possible for the value returned by GetOldestXmin() to move
+ * backwards, so it's not wrong for us to see tuples that appear to
+ * not be visible to everyone yet, while PD_ALL_VISIBLE is already
+ * set. The real safe xmin value never moves backwards, but
+ * GetOldestXmin() is conservative and sometimes returns a value
+ * that's unnecessarily small, so if we see that contradiction it just
+ * means that the tuples that we think are not visible to everyone yet
+ * actually are, and the PD_ALL_VISIBLE flag is correct.
+ *
+ * There should never be dead tuples on a page with PD_ALL_VISIBLE
+ * set, however.
+ */
+ else if (PageIsAllVisible(page) && has_dead_tuples)
+ {
+ elog(WARNING, "page containing dead tuples is marked as
all-visible in relation \"%s\" page %u",
+ relname, blkno);
+ PageClearAllVisible(page);
+ MarkBufferDirty(buf);
+ visibilitymap_clear(onerel, blkno, vmbuffer);
+ }
+
+ UnlockReleaseBuffer(buf);
+
+ /* Remember the location of the last page with nonremovable tuples */
+ if (hastup)
+ vacrelstats->nonempty_pages = blkno + 1;
+
+ /*
+ * If we remembered any tuples for deletion, then the page will be
+ * visited again by lazy_vacuum_heap, which will compute and record
+ * its post-compaction free space. If not, then we're done with this
+ * page, so remember its free space as-is. (This path will always be
+ * taken if there are no indexes.)
+ */
+ if (vacrelstats->num_dead_tuples == prev_dead_count)
+ RecordPageWithFreeSpace(onerel, blkno, freespace);
+}
/*
* lazy_vacuum_page() -- free dead tuples on a page
--
2.1.2
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers