Re: [HACKERS] Relation cache invalidation on replica

Konstantin Knizhnik Fri, 26 Feb 2016 23:53:04 -0800

On 02/27/2016 04:16 AM, Simon Riggs wrote:

On 27 February 2016 at 00:33, Simon Riggs <[email protected] 
<mailto:[email protected]>> wrote:

    On 27 February 2016 at 00:29, Andres Freund <[email protected] 
<mailto:[email protected]>> wrote:

        On 2016-02-26 18:05:55 +0300, Konstantin Knizhnik wrote:
        > The reason of the problem is that invalidation messages are not 
delivered to
        > replica after the end of concurrent create index.
        > Invalidation messages are included in xlog as part of transaction 
commit
        > record.
        > Concurrent index create is split into three transaction, last of 
which is
        > just performing inplace update of index tuple, marking it as valid and
        > invalidating cache. But as far as this transaction is not assigned 
XID, no
        > transaction record is created in WAL and send to replicas. As a 
result,
        > replica doesn't receive this invalidation messages.

        Ugh, that's a fairly ugly bug.

    Looking now.

If the above is true, then the proposed fix wouldn't work either.

No point in sending a cache invalidation message on the standby if you haven't 
also written WAL, since the catalog re-read would just see the old row.

heap_inplace_update() does write WAL, which blows away the starting premise.

So I'm not seeing this as an extant bug in an open source version of 
PostgreSQL, in my current understanding.


Inplace update really creates record in WAL and this is why index is marked as 
valid at replica.
But invalidation messages are sent only with transaction commit record and such 
record is not created in this case,
because there is no assigned XID.

This is a real bug which originally observed by one of our customers with 
different versions of Postgres (last one them have tried was 9.5.1).
Then we reproduced it locally and determined the reason of the problem.
Repro scenario is very simple: you just need to create large enough table 
(pgbench with scale factor 100 works well in my case)
so that "create index concurrently" takes substantial amount of time. If, while 
this statement is in progress, you will execute some query at replica which
uses this index, then it will cache state of relation without index. And after 
even when index is actually constructed, it will never be used in this backend 
(but other backends at replica will use it).

I am not sure about the best way of fixing the problem.
I have not tested Andreas proposal:

if (nrels != 0 || nmsgs != 0 || RelcacheInitFileInval)

if it actually fixes the problem.
Assigning XID in heap_inplace_update definitely should work.
It is better than forcing assignment XID in DefineIndex? I am not sure, because 
this problem seems to be related only with concurrent update
(but may be I am wrong).
At least not all inplace updates should cause catalog invalidation and so 
require XID assignment.

--
Simon Riggs http://www.2ndQuadrant.com/ <http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Relation cache invalidation on replica

Reply via email to