On Fri, Jun 24, 2016 at 11:04 AM, Amit Kapila <amit.kapil...@gmail.com> wrote: > On Fri, Jun 24, 2016 at 4:33 AM, Andres Freund <and...@anarazel.de> wrote: >> On 2016-06-23 18:59:57 -0400, Alvaro Herrera wrote: >>> Andres Freund wrote: >>> >>> > I'm looking into three approaches right now: >>> > >>> > 3) Use WAL logging for the already_marked = true case. >>> >>> >>> > 3) This approach so far seems the best. It's possible to reuse the >>> > xl_heap_lock record (in an afaics backwards compatible manner), and in >>> > most cases the overhead isn't that large. It's of course annoying to >>> > emit more WAL, but it's not that big an overhead compared to extending a >>> > file, or to toasting. It's also by far the simplest fix. >>> > > +1 for proceeding with Approach-3. > >>> I suppose it's fine if we crash midway from emitting this wal record and >>> the actual heap_update one, since the xmax will appear to come from an >>> aborted xid, right? >> >> Yea, that should be fine. >> >> >>> I agree that the overhead is probably negligible, considering that this >>> only happens when toast is invoked. It's probably not as great when the >>> new tuple goes to another page, though. >> >> I think it has to happen in both cases unfortunately. We could try to >> add some optimizations (e.g. only release lock & WAL log if the target >> page, via fsm, is before the current one), but I don't really want to go >> there in the back branches. >> > > You are right, I think we can try such an optimization in Head and > that too if we see a performance hit with adding this new WAL in > heap_update. > >
+1 for #3 approach, and attached draft patch for that. I think attached patch would fix this problem but please let me know if this patch is not what you're thinking. Regards, -- Masahiko Sawada
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c index 57da57a..2f3fd83 100644 --- a/src/backend/access/heap/heapam.c +++ b/src/backend/access/heap/heapam.c @@ -3923,6 +3923,28 @@ l2: if (need_toast || newtupsize > pagefree) { + /* + * To prevent data corruption due to updating old tuple by + * other backends after released buffer, we need to emit that + * xmax of old tuple is set and clear visibility map bits if + * needed before relasing buffer. We can reuse xl_heap_lock + * for this pupose. It should be fine even if we crash midway + * from this section and the actual updating one later, since + * the xmax will appear to come from an aborted xid. + */ + START_CRIT_SECTION(); + + /* Celar PD_ALL_VISIBLE flags */ + if (PageIsAllVisible(BufferGetPage(buffer))) + { + all_visible_cleared = true; + PageClearAllVisible(BufferGetPage(buffer)); + visibilitymap_clear(relation, BufferGetBlockNumber(buffer), + vmbuffer); + } + + MarkBufferDirty(buffer); + /* Clear obsolete visibility flags ... */ oldtup.t_data->t_infomask &= ~(HEAP_XMAX_BITS | HEAP_MOVED); oldtup.t_data->t_infomask2 &= ~HEAP_KEYS_UPDATED; @@ -3936,6 +3958,26 @@ l2: /* temporarily make it look not-updated */ oldtup.t_data->t_ctid = oldtup.t_self; already_marked = true; + + if (RelationNeedsWAL(relation)) + { + xl_heap_lock xlrec; + XLogRecPtr recptr; + + XLogBeginInsert(); + XLogRegisterBuffer(0, buffer, REGBUF_STANDARD); + + xlrec.offnum = ItemPointerGetOffsetNumber(&oldtup.t_self); + xlrec.locking_xid = xid; + xlrec.infobits_set = compute_infobits(oldtup.t_data->t_infomask, + oldtup.t_data->t_infomask2); + XLogRegisterData((char *) &xlrec, SizeOfHeapLock); + recptr = XLogInsert(RM_HEAP_ID, XLOG_HEAP_LOCK); + PageSetLSN(page, recptr); + } + + END_CRIT_SECTION(); + LockBuffer(buffer, BUFFER_LOCK_UNLOCK); /*
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers