we might need marek on this one because i did not implement the master/slave logic and am not using it for multiple slaves.
marek, can you please comment on this? thanks, raoul On 2011-08-29 23:08, Michael Szilagyi wrote:
Did some more testing and figured I would add that even Slave resources rejoin the cluster as a Master role briefly before switching back to Slave. Of course, since the mysql RA uses event notification this still has the effect of unsetting all masters whenever a new node joins. Since a master role is possibly configured already, the pre-premote notification event doesn't get fired again and replication remains broken. It seems likely that I must be doing something wrong since this would be a pretty normal use case and completely breaks the mysql replication cluster. Thoughts anyone? On Fri, Aug 26, 2011 at 10:19 AM, Michael Szilagyi <mszila...@gmail.com <mailto:mszila...@gmail.com>> wrote: I'm having a problem with master/slave promotion using the most recent version of the mysql ocf script hosted off the clusterLabs/resource-agents github repo. The script works well failing over to a slave if a master looses connection with the cluster. However, when the master rejoins the cluster the script is doing some undesirable things. Basically, if the master looses connection (say I pull the network cable) then a new slave is promoted and the old master is just orphaned (which is fine, I don't have STONITH setup yet or anything). If i plug that machine's cable back in then the node rejoins the cluster and initially there are now two masters (the old, orphaned one and the newly promoted one). Pacemaker properly sees this and demotes the old master to a slave. After some time debugging the ocf I think what is happening is that the script sees the old master join and fires off a post-demote notification event for the returning master which causes a unset_master command to be executed. This causes all the slaves to remove their master connection info. However, since the other master server has already been promoted and is (to its mind) already replicating to the other slaves in the cluster, a new pre-promote is never fired which means that the slaves do not get a new CHANGE MASTER TO issued so I wind up with a broken replication setup. I'm not sure if I'm missing something in how this is supposed to be working or if this is a limitation of the script. It seems like there must be either a bug or something I've got setup wrong, however, since it's not all that unlikely that such a scenario could occur. If anyone has any ideas or suggestions on how the script is supposed to work (or what I may be doing wrong) I'd appreciate some ideas. I'll include the output of my crm configure show in case it'll be useful: node $id="a1a3266d-24e2-4d1b-bfd7-de3bac929661" seven \ attributes 172.17.0.130-log-file-p_mysql="mysql-bin.000005" 172.17.0.130-log-pos-p_mysql="865" 172.17.0.131-log-file-p_mysql="mysql-bin.000038" 172.17.0.131-log-pos-p_mysql="607" four-log-file-p_mysql="mysql-bin.000040" four-log-pos-p_mysql="2150" node $id="cc0227a2-a7bc-4a0d-ba1b-f6ecb7e7d845" four \ attributes 172.17.0.130-log-file-p_mysql="mysql-bin.000005" 172.17.0.130-log-pos-p_mysql="865" three-log-file-p_mysql="mysql-bin.000022" three-log-pos-p_mysql="106" node $id="d9d3c6cb-bf60-4468-926f-d9716e56fb0f" three \ attributes 172.17.0.131-log-file-p_mysql="mysql-bin.000038" 172.17.0.131-log-pos-p_mysql="607" three-log-pos-p_mysql="4" primitive p_mysql ocf:heartbeat:mysql \ params binary="/usr/sbin/mysqld" config="/etc/mysql/my.cnf" \ params pid="/var/lib/mysql/mySQL.pid" socket="/var/run/mysqld/mysqld.sock" \ params replication_user="sqlSlave" replication_passwd="slave" \ params additional_parameters="--skip-slave-start" \ op start interval="0" timeout="120" \ op stop interval="0" timeout="120" \ op promote interval="0" timeout="120" \ op demote interval="0" timeout="120" \ op monitor interval="5" role="Master" timeout="30" \ op monitor interval="10" role="Slave" timeout="30" ms ms_mysql p_mysql \ meta master-max="1" clone-max="3" target-role="Started" is-managed="true" notify="true" \ meta target-role="Started" property $id="cib-bootstrap-options" \ dc-version="1.0.9-da7075976b5ff0bee71074385f8fd02f296ec8a3" \ cluster-infrastructure="Heartbeat" \ stonith-enabled="false" \ last-lrm-refresh="1314307995" Thanks! _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
-- ____________________________________________________________________ DI (FH) Raoul Bhatia M.Sc. email. r.bha...@ipax.at Technischer Leiter IPAX - Aloy Bhatia Hava OEG web. http://www.ipax.at Barawitzkagasse 10/2/2/11 email. off...@ipax.at 1190 Wien tel. +43 1 3670030 FN 277995t HG Wien fax. +43 1 3670030 15 ____________________________________________________________________ _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker