On 2013-05-29 03:56:38 +0200, Andres Freund wrote:
> On 2013-05-28 21:36:17 -0400, Robert Haas wrote:
> > On Tue, May 28, 2013 at 1:58 PM, Andres Freund <and...@2ndquadrant.com> 
> > wrote:
> > > Guessing around I looked and noticed the following problematic pattern:
> > > 1) A: wants to do an update, doesn't have enough freespace
> > > 2) A: extends the relation on the filesystem level 
> > > (RelationGetBufferForTuple)
> > > 3) A: does PageInit (RelationGetBufferForTuple)
> > > 4) A: aborts, e.g. due to a serialization failure (heap_update)
> > >
> > > At this point the page is initialized in memory, but not wal logged. It
> > > isn't pinned or locked either.
> > >
> > > 5) B: vacuum finds that page and it's empty. So it marks it all
> > > visible. But since the page wasn't written out (we haven't even marked
> > > it dirty in 3.) the standby doesn't know that and reports the page as
> > > being uninitialized.
> > >
> > > ISTM the best backbranchable fix for this is to teach lazy_scan_heap to
> > > log an FPI for the heap page via visibilitymap_set in that rather
> > > limited case.
> > >
> > > Happy to provide a patch unless somebody has a better idea?
> >
> > Good catch.  However, I would suggest using log_newpage() before
> > visibilitymap_set() rather than trying to stick extra logic into
> > visibilitymap_set().  I think that will be cleaner and simpler.
>
> Thought about that, but given that 9.3's visibilitymap_set already will
> already FPI heap pages I concluded it wouldn't really be an improvement
> since it's only one ||log_heap_page or so there. Not sure what's
> better. Will write the patch and see how it goes.

Ended up using log_newpage_buffer since reusing visibilitymap_set's
record would break the wal format as we currently do not accept an FPI
on the heap pages during replay when < 9.3. Forcing to upgrade the
client first would be rather unfriendly...

That has the disadvantage of logging a full heap page since it doesn't
use the hole optimization but this happens really infrequently, so ...

I don't think this can happen < 9.2 in at least this code path. I wonder
though if there are other codepaths were this can happen although I
haven't found any on a cursory inspection.

Greetings,

Andres Freund

-- 
 Andres Freund                     http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
>From 62e1c6fc679af80f8f4d26f1fe9d2fae8f430100 Mon Sep 17 00:00:00 2001
From: Andres Freund <and...@anarazel.de>
Date: Wed, 29 May 2013 15:37:05 +0200
Subject: [PATCH] Ensure that all_visible WAL records operate on an intialized
 page

There was a race condition were lazy_scan_heap could cause all_visible records
to be emitted although the page marked all visible wasn't logged in an
initialized state yet causing a PANIC during WAL replay.
---
 src/backend/commands/vacuumlazy.c |   25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
index 9d30415..4eff3c1 100644
--- a/src/backend/commands/vacuumlazy.c
+++ b/src/backend/commands/vacuumlazy.c
@@ -663,6 +663,31 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
 			/* empty pages are always all-visible */
 			if (!PageIsAllVisible(page))
 			{
+				/*
+				 * There's a race condition here we need to prevent: Another
+				 * backend might have extended the heap and PageInit'ed but
+				 * ERRORed before filling the page with actual content.
+				 *
+				 * Since relation extension is not WAL logged (and we don't
+				 * even dirty the buffer) it's quite possible that the page's
+				 * content hasn't made it to disk yet, especially on a
+				 * standby. During WAL replay we then could find an
+				 * uninitialized page when trying to set the all visible flag
+				 * which will cause a PANIC lateron.
+				 *
+				 * If there already is a WAL record covering this page
+				 * (i.e. its LSN is valid) the respective record it will
+				 * initialize the page, so we don't need to do anything in that
+				 * case.
+				 *
+				 * XXX: It would be nice to use a logging method supporting
+				 * standard buffers here since log_newpage_buffer() will write
+				 * the full block instead of omitting the hole.
+				 */
+				if (RelationNeedsWAL(onerel) &&
+					PageGetLSN(page) == InvalidXLogRecPtr)
+					log_newpage_buffer(buf);
+
 				PageSetAllVisible(page);
 				MarkBufferDirty(buf);
 				visibilitymap_set(onerel, blkno, buf, InvalidXLogRecPtr,
-- 
1.7.10.4

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to