Re: [Pacemaker] PGSQL resource promotion issue

Takatoshi MATSUO Thu, 04 Apr 2013 05:16:58 -0700

Hi Steven

I made a patch as a trial.
https://github.com/t-matsuo/resource-agents/commit/bd3b587c6665c4f5eba0491b91f83965e601bb6b#heartbeat/pgsql


This patch never show "STREAMING|POTENTIAL".

Thanks,
Takatoshi MATSUO

2013/4/4 Takatoshi MATSUO <matsuo....@gmail.com>:
> Hi Steven
>
> Sorry for late reply
>
> 2013/3/29 Steven Bambling <smbambl...@arin.net>:
>> Taskatoshi/Rainer thanks so much for the quick responses and clarification.
>>
>> In response to the rep_mode being set to sync.
>>
>> If the master is running the monitor check as low as every 1s, then its 
>> updating the nodes with the "new" master preference in the event that the 
>> current synchronous replica couldn't be reached and the postgres service 
>> then selected the next node in the synchronous_standby_names list to perform 
>> they synchronous transaction with.
>>
>> If you are doing multiple transactions a second then doesn't it become 
>> possible for the postgres service to switch it synchronous replication 
>> replica ( from node2 to node3 for instance ) and potentially fail ( though I 
>> think the risk seems small ) before the monitor function is invoke to update 
>> the master preference?
>>
>> In this case you've committed a transaction(s) and reported it back to your 
>> application that it was successful, but when the new master is promoted it 
>> doesn't have the committed transactions because it is located on the other 
>> replica  ( and the failed master ).  Basically you've lost these 
>> transactions even though they were reported successful.
>
> Yes !
> I didn't consider this situation.
>
>>
>> The only way I can see getting around this would be to compare the current 
>> xlog locations on each of the remaining replicas, the promoting the one that 
>> meets your business needs.
>>         1. If you need to have greater data consistency.
>>                 - promote the node that has the furtherest log location even 
>> IF they haven't been replayed and there is some "recovery" period.
>>
>>         2. If you need to have greater "up time"
>>                 - promote the node that has the furtherest log location, 
>> taking into account the replay lag
>>                         - promote the node that has the furthest head or 
>> near furthest ahead log location and the LESS replay lag.
>
> How do slaves get "up time" ?
> I think slaves can't know the replay lag.
>
>> Does this even seem possible with a resource agent or is my thinking totally 
>> off?
>
> Method 1 and 2 may cause data loss.
> If you can accept data loss, you use "rep_mode=async".
> It's about the same as method 1.
>
>
> How about refraining from switching synchronous replication replica to avoid
> data loss to set one node into "synchronous_standby_names" ?
>
>
> Thanks,
> Takatoshi MATSUO

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] PGSQL resource promotion issue

Reply via email to