Taskatoshi/Rainer thanks so much for the quick responses and clarification.

In response to the rep_mode being set to sync.

If the master is running the monitor check as low as every 1s, then its 
updating the nodes with the "new" master preference in the event that the 
current synchronous replica couldn't be reached and the postgres service then 
selected the next node in the synchronous_standby_names list to perform they 
synchronous transaction with.

If you are doing multiple transactions a second then doesn't it become possible 
for the postgres service to switch it synchronous replication replica ( from 
node2 to node3 for instance ) and potentially fail ( though I think the risk 
seems small ) before the monitor function is invoke to update the master 
preference? 

In this case you've committed a transaction(s) and reported it back to your 
application that it was successful, but when the new master is promoted it 
doesn't have the committed transactions because it is located on the other 
replica  ( and the failed master ).  Basically you've lost these transactions 
even though they were reported successful.

The only way I can see getting around this would be to compare the current xlog 
locations on each of the remaining replicas, the promoting the one that meets 
your business needs.
        1. If you need to have greater data consistency.
                - promote the node that has the furtherest log location even IF 
they haven't been replayed and there is some "recovery" period.
        2. If you need to have greater "up time"
                - promote the node that has the furtherest log location, taking 
into account the replay lag
                        - promote the node that has the furthest head or near 
furthest ahead log location and the LESS replay lag.

Does this even seem possible with a resource agent or is my thinking totally 
off?

v/r

STEVE

On Mar 29, 2013, at 8:35 AM, Takatoshi MATSUO <matsuo....@gmail.com>
 wrote:

> Hi Steven
> 
> 2013/3/29 Steven Bambling <smbambl...@arin.net>:
>> 
>> On Mar 28, 2013, at 8:13 AM, Rainer Brestan <rainer.bres...@gmx.net> wrote:
>> 
>> Hi Steve,
>> i think, you have misunderstood how ip addresses are used with this setup,
>> PGVIP should start after promotion.
>> Take a look at Takatoshi´s Wiki.
>> https://github.com/t-matsuo/resource-agents/wiki/Resource-Agent-for-PostgreSQL-9.1-streaming-replication
>> 
>> 
>> I see that he has the master/replication VIPs with a resource order to force
>> promotion before moving the VIPs to the new master.
>> 
>>  I don't get how the postgres service is going to listen on those
>> interfaces if they have not already migrated to the new master.  Even with
>> setting the listen_addresses = "*"
>> 
>> 
>> The promotion sequency is very simple.
>> When no master is existing, all slaves write their current replay xlog into
>> the node attribute PGSQL-xlog-loc during monitor call.
>> 
>> Does this also hold true if a Master fails?
>> 
>> From the looks of it, if there was a Master before the failure that the
>> master score is set from the function that grabs the data_status from the
>> master (STREAMING|SYNC, STREAMING|ASYNC, STREAMING|POTENTIAL, etc ).
>> 
>> The reason I ask is if the master fails and the slaves don't then compare
>> their xlog location, there is a potential for data loss if the incorrect
>> slave is promoted.
>> 
> 
> If rep_mode is "async", the RA works as Rainer says
> because the RA can't know which node should be promoted.
> 
> OTOH if rep_mode  is "sync",  the RA promote the node which has 
> "SREAMING|SYNC"
> if the master fails.
> So the incorrenct slave can't be promoted.
> Insted, Slaves whose xlog is newer than new Master's one
> are failed forcedly on pre-promote.
> 
> 
>> 
>> You can see all them with crm_mon -A1f.
>> Each slave gets these attributes from all node configured in parameter
>> node_list (hopefully your node names in Pacemaker are the same as in
>> node_list) and compares them to get the highest.
>> If the highest is this list is the own one, it sets the master-score to
>> 1000, on other nodes to 100.
>> Pacemaker then selects the node with the highest master score and promote
>> this.
>> 
>> Rainer
>> Gesendet: Mittwoch, 27. März 2013 um 14:37 Uhr
>> Von: "Steven Bambling" <smbambl...@arin.net>
>> An: "The Pacemaker cluster resource manager" <pacemaker@oss.clusterlabs.org>
>> Betreff: Re: [Pacemaker] PGSQL resource promotion issue
>> In talking with andreask from IRC, I  miss understood the need to include
>> the op monitor.  I figured it was pulled from the resource script by
>> default.
>> 
>> I used pcs to add the new attributes and one was then promoted to master
>> 
>> pcs resource add_operation PGSQL monitor interval=5s role=Master
>> pcs resource add_operation PGSQL monitor interval=7s
>> 
>> v/r
>> 
>> STEVE
>> 
> 
> --
> Takatoshi MATSUO
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Reply via email to