Hello Attila, On 12/20/2011 09:29 AM, Attila Megyeri wrote: > Hi Andreas, > > > -----Original Message----- > From: Andreas Kurz [mailto:andr...@hastexo.com] > Sent: 2011. december 19. 15:19 > To: pacemaker@oss.clusterlabs.org > Subject: Re: [Pacemaker] [SOLVED] RE: Slave does not start after failover: > Mysql circular replication and master-slave resources > > On 12/17/2011 10:51 AM, Attila Megyeri wrote: >> Hi all, >> >> For anyone interested. >> I finally made the mysql replication work. For some strange reason there >> were no [mysql] log entries at all neither in corosync.log nor in the >> syslog. After a couple of corosync restarts (?!) [mysql] RA debug/error >> entries started to show up. >> The issue was that the slave could not apply the binary logs due to some >> duplicate errors. I am not sure how this could happen, but the solution was >> to ignore the duplicate errors on the slaves, by adding the following line >> to the my.conf: >> >> slave-skip-errors = 1062 > > although you use different "auto-increment-offset" values? > > > Yes... I am actually quite surprised how this can happen. The slave has > applied the binlog already, but for some reason it wants to execute it again. > >> >> I hope this helps to some of you guys as well. >> >> P.S. Did anyone else notice missing mysql debug/info/error entries in >> corosync log as well? > > There is no RA output/log in any of your syslogs? ... in absence of a > connected tty and no configured logd, logger should feed all logs to syslog > ... what is your distribution, any "fancy" syslog configuration? > > My system is running on a Debian squeeze, pacemaker 1.1.5 squeeze backport. > The syslog configuration is standard, no extras. I have noticed this strange > behavior (RA not logging anything) many times - not only for the mysql > resource but also for postgres. E.g. I added a log_ocf at the entry point of > the RA, just to log when the script is executed and what parameter was passed > - but I did not see any "monitor" invokes either. > Now it works fine, but this is not an absolutely stable setup.
seems to be "improvable", see: http://bugs.clusterlabs.org/show_bug.cgi?id=5024 > > One other very disturbing issue is, that sometimes corosync and some of the > heartbeat processes stuck at 100% CPU and only restart/kill -9 help. :( now that looks really ugly ... are you using MCP or let corosync start pacemaker? Regards, Andreas -- Need help with Pacemaker? http://www.hastexo.com/now > > Cheers, > > Attila > > Regards, > Andreas > > -- > Need help with Pacemaker? > http://www.hastexo.com/now > >> >> Cheers, >> Attila >> >> >> -----Original Message----- >> From: Attila Megyeri [mailto:amegy...@minerva-soft.com] >> Sent: 2011. december 16. 12:39 >> To: The Pacemaker cluster resource manager >> Subject: Re: [Pacemaker] Slave does not start after failover: Mysql >> circular replication and master-slave resources >> >> Hi Andreas, >> >> The slave lag cannot be high, as the slave was restarted within 1-2 mins and >> there are no active users on the system yet. >> I did not find anything at all in the logs. >> >> I will doublecheck if the RA is the latest. >> >> Thanks, >> >> Attila >> >> >> -----Original Message----- >> From: Andreas Kurz [mailto:andr...@hastexo.com] >> Sent: 2011. december 16. 1:50 >> To: pacemaker@oss.clusterlabs.org >> Subject: Re: [Pacemaker] Slave does not start after failover: Mysql >> circular replication and master-slave resources >> >> Hello Attila, >> >> ... see below ... >> >> On 12/15/2011 02:42 PM, Attila Megyeri wrote: >>> Hi All, >>> >>> >>> >>> Some time ago I exchanged a couple of posts with you here regarding >>> Mysql active-active HA. >>> >>> The best solution I found so far was the Mysql multi-master >>> replication, also referred to as circular replication. >>> >>> >>> >>> Basically I set up two nodes, both were capable of the master role, >>> and the changes were immediately propagated to the other node. >>> >>> >>> >>> But still I wanted to have a M/S approach, to have a RW master and a >>> RO slave - mainly because I prefer to have a signle master VIP where >>> my apps can connect to. >>> >>> >>> >>> (In the first approach I configured a two node clone, and the master >>> IP was always bound to one of the nodes) >>> >>> >>> >>> I applied the following configuration: >>> >>> >>> >>> node db1 \ >>> >>> attributes IP="10.100.1.31" \ >>> >>> attributes standby="off" >>> db2-log-file-db-mysql="mysql-bin.000021" db2-log-pos-db-mysql="40730" >>> >>> node db2 \ >>> >>> attributes IP="10.100.1.32" \ >>> >>> attributes standby="off" >>> >>> primitive db-ip-master ocf:heartbeat:IPaddr2 \ >>> >>> params lvs_support="true" ip="10.100.1.30" cidr_netmask="8" >>> broadcast="10.255.255.255" \ >>> >>> op monitor interval="20s" timeout="20s" \ >>> >>> meta target-role="Started" >>> >>> primitive db-mysql ocf:heartbeat:mysql \ >>> >>> params binary="/usr/bin/mysqld_safe" config="/etc/mysql/my.cnf" >>> datadir="/var/lib/mysql" user="mysql" pid="/var/run/mysqld/mysqld.pid" >>> socket="/var/run/mysqld/mysqld.sock" test_passwd="XXXXX" >>> >>> test_table="replicatest.connectioncheck" test_user="slave_user" >>> replication_user="slave_user" replication_passwd="XXXXX" >>> additional_parameters="--skip-slave-start" \ >>> >>> op start interval="0" timeout="120s" \ >>> >>> op stop interval="0" timeout="120s" \ >>> >>> op monitor interval="30" timeout="30s" OCF_CHECK_LEVEL="1" \ >>> >>> op promote interval="0" timeout="120" \ >>> >>> op demote interval="0" timeout="120" >>> >>> ms db-ms-mysql db-mysql \ >>> >>> meta notify="true" master-max="1" clone-max="2" >>> target-role="Started" >>> >>> colocation db-ip-with-master inf: db-ip-master db-ms-mysql:Master >>> >>> property $id="cib-bootstrap-options" \ >>> >>> dc-version="1.1.5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f" \ >>> >>> cluster-infrastructure="openais" \ >>> >>> expected-quorum-votes="2" \ >>> >>> stonith-enabled="false" \ >>> >>> no-quorum-policy="ignore" >>> >>> rsc_defaults $id="rsc-options" \ >>> >>> resource-stickiness="0" >>> >>> >>> >>> >>> >>> The setup works in the basic conditions: >>> >>> * After the "first" startup, nodes start up as slaves, and >>> shortly after, one of them is promoted to master. >>> >>> * Updates to the master are replicated properly to the slave. >>> >>> * Slave accepts updates, which is Wrong, but I can live with >>> this - I will allow connect to the Master VIP only. >>> >>> * If I stop the slave for some time, and re-start it, it will >>> catch up with the master shortly and get into sync. >>> >>> >>> >>> I have, however a serious issue: >>> >>> * If I stop the current master, the slave is promoted, accepts >>> RW queries, the Master IP is bound to it - ALL fine. >>> >>> * BUT - when I want to bring the other node online, it simply >>> shows: Stopped (not installed) >>> >>> >>> >>> Online: [ db1 db2 ] >>> >>> >>> >>> db-ip-master (ocf::heartbeat:IPaddr2): Started db1 >>> >>> Master/Slave Set: db-ms-mysql [db-mysql] >>> >>> Masters: [ db1 ] >>> >>> Stopped: [ db-mysql:1 ] >>> >>> >>> >>> Node Attributes: >>> >>> * Node db1: >>> >>> + IP : 10.100.1.31 >>> >>> + db2-log-file-db-mysql : mysql-bin.000021 >>> >>> + db2-log-pos-db-mysql : 40730 >>> >>> + master-db-mysql:0 : 3601 >>> >>> * Node db2: >>> >>> + IP : 10.100.1.32 >>> >>> >>> >>> Failed actions: >>> >>> db-mysql:0_monitor_30000 (node=db2, call=58, rc=5, status=complete): >>> not installed >>> >> >> Looking at the RA (latest from git) I'd say the problem is somewhere in the >> check_slave() function. Either the check for replication errors or for a too >> high slave lag ... though on both errors you should see the log. entries. >> >> Regards, >> Andreas >> >> -- >> Need help with Pacemaker? >> http://www.hastexo.com/now >> >> >>> >>> >>> >>> >>> I checked the logs, and could not find a reason why the slave at db2 >>> is not started. >>> >>> Any IDEA Anyone ? >>> >>> >>> >>> >>> >>> Thanks, >>> >>> Attila >>> >>> >>> >>> _______________________________________________ >>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>> >>> Project Home: http://www.clusterlabs.org Getting started: >>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> Bugs: http://bugs.clusterlabs.org >> >> >> >> _______________________________________________ >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org Getting started: >> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org >> >> _______________________________________________ >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org Getting started: >> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org > > > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org
signature.asc
Description: OpenPGP digital signature
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org