On Fri, May 20, 2011 at 8:04 AM, Eamon Roque <[email protected]>wrote:
> > > On Fri, May 20, 2011 at 7:09 AM, Eamon Roque <[email protected] > >wrote: > > > > > > > > > > > > > On Fri, May 20, 2011 at 3:42 AM, Eamon Roque < > [email protected] > > > >wrote: > > > > > > > > > Hi, > > > > > > > > > > > > > > > >> On Thu, May 19, 2011 at 5:05 AM, Eamon Roque < > > > [email protected] > > > > > >wrote: > > > > > > > > > > >> Hi, > > > > > >> > > > > > >> I've put together a cluster of two nodes running a databank > without > > > > > shared > > > > > >> storage. Both nodes replicate data between them, which is taken > care > > > of > > > > > by > > > > > >> the databank itself. > > > > > >> > > > > > >> I have a resource for the databank and ip. I then created a > stateful > > > > > clone > > > > > >> from the databank resource. I created colocation rules joining > the > > > > > >> databank-ms-clone and ip: > > > > > >> > > > > > >> node pgsqltest1 > > > > > >> node pgsqltest2 > > > > > >> primitive Postgres-IP ocf:heartbeat:IPaddr2 \ > > > > > >> params ip="10.19.57.234" cidr_netmask="32" \ > > > > > >> op monitor interval="30s" \ > > > > > >> meta is-managed="false" > > > > > >> primitive resPostgres ocf:heartbeat:pgsql \ > > > > > >> params pgctl="/opt/PostgreSQL/9.0/bin/pg_ctl" > > > > > >>pgdata="/opt/PostgreSQL/9.0/data" > psql="/opt/PostgreSQL/9.0/bin/psql" > > > > > >> pgdba="postgres" \ > > > > > >> op monitor interval="1min" \ > > > > > >> meta is-managed="false" > > > > > >> ms msPostgres resPostgres \ > > > > > >> meta master-max="1" master-node-max="1" clone-max="2" > > > > > >> clone-node-max="1" notify="true" target-role="started" > > > > > >> colocation colPostgres inf: Postgres-IP msPostgres:Master > > > > > >> order ordPostgres inf: msPostgres:promote Postgres-IP:start > > > > > >> property $id="cib-bootstrap-options" \ > > > > > >> > dc-version="1.1.2-2e096a41a5f9e184a1c1537c82c6da1093698eb5" > > > \ > > > > > >> cluster-infrastructure="openais" \ > > > > > >> expected-quorum-votes="2" \ > > > > > >> stonith-enabled="false" \ > > > > > >> no-quorum-policy="ignore" \ > > > > > >> last-lrm-refresh="1302707146" > > > > > >> rsc_defaults $id="rsc-options" \ > > > > > >> resource-stickiness="200" > > > > > >> op_defaults $id="op_defaults-options" \ > > > > > >> record-pending="false" > > > > > >> > > > > > >> The normal postgres agent doesn't support this functionality, > but > > > I've > > > > > put > > > > > >> together my own using the mysql agent as a model. Before running > the > > > > > script > > > > > >> through ocf-tester, I unmanage the postgres resource. > > > > > >> > > > > > > > > > > > Could you show how you implemented promote/demote for pgsql? > > > > > > > > > > Sure, let's start with the ultra-simple "promote" function: > > > > > > > > > > # > > > > > # These variables are higher up in the file, but they will probably > > > help > > > > > with understanding the error of > > > > > # my ways. > > > > > > > > > > CRM_MASTER="${HA_SBIN_DIR}/crm_master" > > > > > ATTRD_UPDATER="${HA_SBIN_DIR}/attrd_updater" > > > > > > > > > > pgsql_promote() { > > > > > local output > > > > > local rc > > > > > local CHECK_PG_SQL > > > > > local COMPLETE_STANDBY_QUERY > > > > > local PROMOTE_SCORE_HIGH > > > > > local MOD_PSQL_M_FORMAT > > > > > > > > > > > > > > > PROMOTE_SCORE_HIGH=1000 > > > > > CHECK_PG_SQL="SELECT pg_is_in_recovery()" > > > > > MOD_PSQL_M_FORMAT="$OCF_RESKEY_psql -Atc" > > > > > COMPLETE_STANDBY_QUERY="$MOD_PSQL_M_FORMAT > \"$CHECK_PG_SQL\"" > > > > > > > > > > output=$(su - $OCF_RESKEY_pgdba -c > "$COMPLETE_STANDBY_QUERY" > > > 2>&1) > > > > > echo $output > > > > > > > > > > rc=$? > > > > > > > > > > case $output in > > > > > f) > > > > > ocf_log debug "PostgreSQL Node is running > in > > > Master > > > > > mode..." > > > > > return $OCF_RUNNING_MASTER > > > > > ;; > > > > > > > > > > t) > > > > > ocf_log debug "PostgreSQL Node is in > > > Hot_Standby > > > > > mode..." > > > > > return $OCF_SUCCESS > > > > > ;; > > > > > > > > > > *) > > > > > ocf_log err "Critical error in > $CHECK_PG_SQL: > > > > > $output" > > > > > return $OCF_ERR_GENERIC > > > > > ;; > > > > > esac > > > > > > > > > > # > > > > > # "Real" promotion is handled here. > > > > > # The trigger file is created and we check for "recovery.conf" on > the > > > host. > > > > > # If we can't find it, then the file will be copied from the > HA-Config > > > into > > > > > postgres' data folder. > > > > > # > > > > > > > > > > if ! touch $OCF_RESKEY_trigger_file; then > > > > > ocf_log err "$OCF_RESKEY_trigger_file could not be > created!" > > > > > return $OCF_ERR_GENERIC > > > > > fi > > > > > > > > > > if [ ! -f $OCF_RESKEY_recovery_conf ]; then > > > > > ocf_log err "$OCF_RESKEY_recovery_conf doesn't exist!" > > > > > cp $OCF_RESKEY_recovery_conf_ersatz $OCF_RESKEY_pgdata > > > > > return $OCF_SUCCESS > > > > > fi > > > > > > > > > > > > Why do you need this? As far as I know when you switch standby > database > > > to > > > > primary using trigger file recovery.conf gets renamed to > recovery.done. > > > If > > > > you rename it back DB will be put into standby mode after restart.We > are > > > > talking about streaming replication, right? > > > > > > > > > > > Right. The order is wrong. According to the Binary Replication tutorial > on > > > the postgres wiki, when I perform a failover with a trigger file, it > wants > > > to find a "recovery.conf", which it then processes (checking the > archive for > > > missing updates etc.) and renames (after noticing the trigger file). > > > > > > I assumed that this would work in exactly the same way with Streaming > > > Replication. > > > > > > Am I wrong? > > > > > > I think so. You have to have recovery.conf when you start your standby, > not > > master. Actually instance that has recovery.conf always tries to start as > > standby. You have to have master's IP address there and path to archived > log > > files. > > > > > > So the failover behavior in binary replication and streaming replication is > different? Or is the wiki entry just antiquated? > This is correct: http://wiki.postgresql.org/wiki/Streaming_Replication > > > > > > > > > > > > > > > > > > > > > > > # If both file exist or can be created, then the failover fun can > > > start. > > > > > > > > > > ocf_log info "$OCF_RESKEY_trigger_file was created." > > > > > ocf_log info "$OCF_RESKEY_recovery_conf exists and can be copied to > the > > > > > correct location." > > > > > > > > > > # Sometimes, the master needs a bit of time to take the reins. > So... > > > > > > > > > > while : > > > > > do > > > > > pgsql_monitor warn > > > > > rc=$? > > > > > > > > > > if [ $rc -eq $OCF_RUNNING_MASTER ]; then > > > > > break; > > > > > fi > > > > > > > > > > ocf_log debug "Postgres Server could not be promoted. > Please > > > > > wait..." > > > > > > > > > > sleep 1 > > > > > > > > > > done > > > > > > > > > > ocf_log info "Postgres Server has been promoted. Please check on > the > > > > > previous master." > > > > > > > > > > ################################# > > > > > #Attributes Update: # > > > > > ################################# > > > > > > > > > > $ATTRD_UPDATER -n $PGSQL_STATUS_NAME -v \"PRI\" || exit $(echo "Eh! > > > > > Attrd_updater is not working!") > > > > > > > > > > ############################################# > > > > > # Resource stickiness pumped up to 1000 : # > > > > > ############################################# > > > > > > > > > > $CRM_MASTER -v $PROMOTE_WERT_HOCH || exit $(echo "crm_master could > not > > > > > change the Master's status!") > > > > > > > > > > ############ > > > > > # Success! # > > > > > ############ > > > > > > > > > > return $OCF_SUCCESS > > > > > > > > > > } > > > > > > > > > > > > > > > > > > > > > > > > > ###################################################################################################### > > > > > > > > > > Thanks! > > > > > > > > > > > > > > And what about demote? Switching standby into primary using trigger > files > > > > changes TIMELINE in the DB and that invalidates all other standby > > > databases > > > > as well as previous master database. After that you have to restore > them > > > > from a fresh backup made on new master. This particular behavior > stopped > > > me > > > > from implementing Master/Slave functionality in pgsql RA so far. > > > > > > > > BTW, why pgsql is set to is-managed="false" in your > configuration.With > > > this > > > > setting cluster will keep monitoring it but won't take any other > actions > > > > AFAIK. > > > > > > Demote? Well, seeing as neither promote nor demote actually worked for > me, > > > I thought I would start small. > > > > > > > It doesn't work because you have it in unmanaged state I think. > > > > I'm using the ocf-tester utility to test the agent. Won't there be a > conflict if I try and have the cluster manage the resources and then try and > wrest it's control away with my own testing agent? > > > > > > > > > > As far as the trigger file switching goes, you're of course completely > > > right. This behavior isn't really a big deal in my environment, as it's > > > meant as more of test and we want to bring back the demoted servers up > > > manually, but I can see that it would cause a lot of problems in a more > > > > > > That means that demote operation should stop master server which isn't > the > > best behavior IMHO. > > > > I don't disagree. This was the policy that was "agreed" upon, so it's more > of a political issue, really. > > Would you prefer putting it into RO mode? > > > > > > > > complex environment. When I tested the failover functionality without > > > pacemaker, I have to perform a fresh backup even if I waited less than > 30s > > > to bring the old master back up as a standby. > > > > > > I guess that with 9.1 this will be easier... > > > > > > I unmanaged the resources so that my test agent would handle them. Is > this > > > incorrect? > > > > > > > Again I think you are wrong. In this mode pacemaker won't call your RA to > > promote/demote or failover your resource. > > > > > > > > > > > > > > > > > > > > > > ?amon > > > > > > > > > > > > > > > > > > > > >> Unfortunately, promote/demote doesn't work. ocf-tester tries to > use > > > the > > > > > >> "crm_attribute -N pgsql1 -n master-pgrql-replication-agent -l > reboot > > > -v > > > > > >> 100", but the (unmanaged) resources don't accept the score > change. > > > > > >> > > > > > >> I'm pretty sure that I just need to be hit with a clue stick and > > > would > > > > > be > > > > > >> grateful for any help. > > > > > >> > > > > > >> Thanks, > > > > > >> > > > > > >> ?amon > > > > > >> > > > > > > > > > > > > > > > > > > > > -- > > > > > Serge Dubrouski. > > > > > _______________________________________________ > > > > > Pacemaker mailing list: [email protected] > > > > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > > > > > > > > > Project Home: http://www.clusterlabs.org > > > > > Getting started: > > > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > > > > > Bugs: > > > > > > > > > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > Serge Dubrouski. > > > > -------------- next part -------------- > > > > An HTML attachment was scrubbed... > > > > URL: <http://oss.clusterlabs.org/pipermail/pacemaker/attachments/ > > > > 20110520/e1f26230/attachment.html> > > > > > > > > ------------------------------ > > > > > > > > > > > _______________________________________________ > > > > Pacemaker mailing list > > > > [email protected] > > > > > > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > > > > > > > > > > > End of Pacemaker Digest, Vol 42, Issue 53 > > > > ***************************************** > > > > > > _______________________________________________ > > > Pacemaker mailing list: [email protected] > > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > > > > > Project Home: http://www.clusterlabs.org > > > Getting started: > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > > > Bugs: > > > > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker > > > > > > > > > > > > -- > > Serge Dubrouski. > > -------------- next part -------------- > > An HTML attachment was scrubbed... > > URL: <http://oss.clusterlabs.org/pipermail/pacemaker/attachments/ > > 20110520/19777245/attachment.html> > > > > ------------------------------ > > > > _______________________________________________ > > Pacemaker mailing list > > [email protected] > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > > > > > End of Pacemaker Digest, Vol 42, Issue 55 > > ***************************************** > > _______________________________________________ > Pacemaker mailing list: [email protected] > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker > > -- Serge Dubrouski.
_______________________________________________ Pacemaker mailing list: [email protected] http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
