Re: [HACKERS] [BUG?] lag of minRecoveryPont in archive recovery

Fujii Masao Sun, 09 Dec 2012 07:36:59 -0800

On Thu, Dec 6, 2012 at 8:39 PM, Amit Kapila <[email protected]> wrote:
> On Thursday, December 06, 2012 9:35 AM Kyotaro HORIGUCHI wrote:
>> Hello, I have a problem with PostgreSQL 9.2 with Pacemaker.
>>
>> HA standby sometime failes to start under normal operation.
>>
>> Testing with a bare replication pair showed that the standby failes
>> startup recovery under the operation sequence shown below. 9.3dev too,
>> but 9.1 does not have this problem. This problem became apparent by the
>> invalid-page check of xlog, but
>> 9.1 also has same glitch potentially.
>>
>> After the investigation, the lag of minRecoveryPoint behind EndRecPtr in
>> redo loop seems to be the cause. The lag brings about repetitive redoing
>> of unrepeatable xlog sequences such as XLOG_HEAP2_VISIBLE ->
>> SMGR_TRUNCATE on the same page. So I did the same aid work as
>> xact_redo_commit_internal for smgr_redo. While doing this, I noticed
>> that
>> CheckRecoveryConsistency() in redo apply loop should be after redoing
>> the record, so moved it.
>
> I think moving CheckRecoveryConsistency() after redo apply loop might cause
> a problem.
> As currently it is done before recoveryStopsHere() function, which can allow
> connections
> on HOTSTANDY. But now if due to some reason recovery pauses or stops due to
> above function,
> connections might not be allowed as CheckRecoveryConsistency() is not
> called.


Yes, so we should just add the CheckRecoveryConsistency() call after
rm_redo rather than moving it? This issue is related to the old discussion:
http://archives.postgresql.org/pgsql-bugs/2012-09/msg00101.php

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUG?] lag of minRecoveryPont in archive recovery

Reply via email to