Re: [HACKERS] Reviewing freeze map code

Masahiko Sawada Tue, 28 Jun 2016 22:46:09 -0700

On Fri, Jun 24, 2016 at 11:04 AM, Amit Kapila <[email protected]> wrote:
> On Fri, Jun 24, 2016 at 4:33 AM, Andres Freund <[email protected]> wrote:
>> On 2016-06-23 18:59:57 -0400, Alvaro Herrera wrote:
>>> Andres Freund wrote:
>>>
>>> > I'm looking into three approaches right now:
>>> >
>>> > 3) Use WAL logging for the already_marked = true case.
>>>
>>>
>>> > 3) This approach so far seems the best. It's possible to reuse the
>>> > xl_heap_lock record (in an afaics backwards compatible manner), and in
>>> > most cases the overhead isn't that large.  It's of course annoying to
>>> > emit more WAL, but it's not that big an overhead compared to extending a
>>> > file, or to toasting.  It's also by far the simplest fix.
>>>
>
> +1 for proceeding with Approach-3.
>
>>> I suppose it's fine if we crash midway from emitting this wal record and
>>> the actual heap_update one, since the xmax will appear to come from an
>>> aborted xid, right?
>>
>> Yea, that should be fine.
>>
>>
>>> I agree that the overhead is probably negligible, considering that this
>>> only happens when toast is invoked.  It's probably not as great when the
>>> new tuple goes to another page, though.
>>
>> I think it has to happen in both cases unfortunately. We could try to
>> add some optimizations (e.g. only release lock & WAL log if the target
>> page, via fsm, is before the current one), but I don't really want to go
>> there in the back branches.
>>
>
> You are right, I think we can try such an optimization in Head and
> that too if we see a performance hit with adding this new WAL in
> heap_update.
>
>


+1 for #3 approach, and attached draft patch for that.
I think attached patch would fix this problem but please let me know
if this patch is not what you're thinking.

Regards,

--
Masahiko Sawada

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 57da57a..2f3fd83 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -3923,6 +3923,28 @@ l2:
 
 	if (need_toast || newtupsize > pagefree)
 	{
+		/*
+		 * To prevent data corruption due to updating old tuple by
+		 * other backends after released buffer, we need to emit that
+		 * xmax of old tuple is set and clear visibility map bits if
+		 * needed before relasing buffer. We can reuse xl_heap_lock
+		 * for this pupose. It should be fine even if we crash midway
+		 * from this section and the actual updating one later, since
+		 * the xmax will appear to come from an aborted xid.
+		 */
+		START_CRIT_SECTION();
+
+		/* Celar PD_ALL_VISIBLE flags */
+		if (PageIsAllVisible(BufferGetPage(buffer)))
+		{
+			all_visible_cleared = true;
+			PageClearAllVisible(BufferGetPage(buffer));
+			visibilitymap_clear(relation, BufferGetBlockNumber(buffer),
+								vmbuffer);
+		}
+
+		MarkBufferDirty(buffer);
+
 		/* Clear obsolete visibility flags ... */
 		oldtup.t_data->t_infomask &= ~(HEAP_XMAX_BITS | HEAP_MOVED);
 		oldtup.t_data->t_infomask2 &= ~HEAP_KEYS_UPDATED;
@@ -3936,6 +3958,26 @@ l2:
 		/* temporarily make it look not-updated */
 		oldtup.t_data->t_ctid = oldtup.t_self;
 		already_marked = true;
+
+		if (RelationNeedsWAL(relation))
+		{
+			xl_heap_lock xlrec;
+			XLogRecPtr recptr;
+
+			XLogBeginInsert();
+			XLogRegisterBuffer(0, buffer, REGBUF_STANDARD);
+
+			xlrec.offnum = ItemPointerGetOffsetNumber(&oldtup.t_self);
+			xlrec.locking_xid = xid;
+			xlrec.infobits_set = compute_infobits(oldtup.t_data->t_infomask,
+												  oldtup.t_data->t_infomask2);
+			XLogRegisterData((char *) &xlrec, SizeOfHeapLock);
+			recptr = XLogInsert(RM_HEAP_ID, XLOG_HEAP_LOCK);
+			PageSetLSN(page, recptr);
+		}
+
+		END_CRIT_SECTION();
+
 		LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
 
 		/*

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reviewing freeze map code

Reply via email to