Hi
In our usage scenario the standby node could be OOM killed and we have to 
create new standby node.
If master node has uncommitted long transaction and new standby node can not 
provide service.
So for us this is a critical issue.


I do hope any suggestion to this issue.
And can any one help to review the attached patch?
Thanks. 






At 2019-10-22 20:42:21, "Thunder" <thund...@126.com> wrote:

Update the patch.

1. The STANDBY_SNAPSHOT_PENDING state is set when we replay the first 
XLOG_RUNNING_XACTS and the sub transaction ids are overflow.
2. When we log XLOG_RUNNING_XACTS in master node, can we assume that all xact 
IDS < oldestRunningXid are considered finished?
3. If we can assume this, when we replay XLOG_RUNNING_XACTS and change 
standbyState to STANDBY_SNAPSHOT_PENDING, can we record oldestRunningXid to a 
shared variable, like procArray->oldest_running_xid?
4. In standby node when call GetSnapshotData if procArray->oldest_running_xid 
is valid, can we set xmin to be procArray->oldest_running_xid?

Appreciate any suggestion to this issue.





At 2019-10-22 01:27:58, "Robert Haas" <robertmh...@gmail.com> wrote:
>On Mon, Oct 21, 2019 at 4:13 AM Thunder <thund...@126.com> wrote:
>> Can we fix this issue like the following patch?
>>
>> $git diff src/backend/access/transam/xlog.c
>> diff --git a/src/backend/access/transam/xlog.c 
>> b/src/backend/access/transam/xlog.c
>> index 49ae97d4459..0fbdf6fd64a 100644
>> --- a/src/backend/access/transam/xlog.c
>> +++ b/src/backend/access/transam/xlog.c
>> @@ -8365,7 +8365,7 @@ CheckRecoveryConsistency(void)
>>          * run? If so, we can tell postmaster that the database is 
>> consistent now,
>>          * enabling connections.
>>          */
>> -       if (standbyState == STANDBY_SNAPSHOT_READY &&
>> +       if ((standbyState == STANDBY_SNAPSHOT_READY || standbyState == 
>> STANDBY_SNAPSHOT_PENDING) &&
>>                 !LocalHotStandbyActive &&
>>                 reachedConsistency &&
>>                 IsUnderPostmaster)
>
>I think that the issue you've encountered is design behavior.  In
>other words, it's intended to work that way.
>
>The comments for the code you propose to change say that we can allow
>connections once we've got a valid snapshot. So presumably the effect
>of your change would be to allow connections even though we don't have
>a valid snapshot.
>
>That seems bad.
>
>-- 
>Robert Haas
>EnterpriseDB: http://www.enterprisedb.com
>The Enterprise PostgreSQL Company





 

Attachment: standby_service.patch
Description: Binary data

Reply via email to