Hi, On Thu, Aug 23, 2012 at 04:47:11PM -0400, David Parker wrote: > > On 08/23/2012 04:19 PM, Jake Smith wrote: > >>Okay, I think I've almost got this. I updated my Pacemaker config > >>and > >>made a few changes. I put the MysqlIP and mysqld primitives into a > >>resource group called "mysqld-resources", ordered them such that > >>mysqld > >>will always wait for MysqlIP to be ready first, and added constraints > >>to > >>make ha1 the preferred host for the mysqld-resources group and ha2 > >>the > >>failover host. I also created STONITH devices for both ha1 and ha2, > >>and > >>added constraints to fix the STONIOTH location issues. My new > >>constraints section looks like this: > >> > >><constraints> > >><rsc_location id="loc-1" rsc="stonith-ha1" node="ha2" > >>score="INFINITY"/> > >><rsc_location id="loc-2" rsc="stonith-ha2" node="ha1" > >>score="INFINITY"/> > >Don't need the 2 above as long as you have the 2 negative locations below > >for stonith locations. I prefer the negative below because if you ever > >expanded to greater than 2 nodes the stonith for any node could run on any > >node but itself. > > Good call. I'll take those out of the config. > > >><rsc_location id="loc-3" rsc="stonith-ha1" node="ha1" > >>score="-INFINITY"/> > >><rsc_location id="loc-4" rsc="stonith-ha2" node="ha2" > >>score="-INFINITY"/> > >><rsc_location id="loc-5" rsc="mysql-resources" node="ha1" > >>score="200"/> > >Don't need the 0 score below either - the 200 above will take care of it. > >Pretty sure no location constraint is the same as a 0 score location. > > That was based on the example found in the documentation. If I > don't have the 0 score entry, will the service still fail over? > > >><rsc_location id="loc-6" rsc="mysql-resources" node="ha2" score="0"/> > >></constraints> > >> > >>Everything seems to work. I had the virtual IP and mysqld running on > >>ha1, and not on ha2. I shut down ha1 using "poweroff -n" and both > >>the > >>virtual IP and mysqld came up on ha2 almost instantly. When I > >>powered > >>ha1 on again, ha2 shut down the the virtual IP and mysqld. The > >>virtual > >>IP moved over instantly; a continuous ping of the IP produced one > >>"Time > >>to live exceeded" message and one packet was lost, but that's to be > >>expected. However, mysqld took almost 30 seconds to start up on ha1 > >>after being stopped on ha2, and I'm not exactly sure why. > >> > >>Here's the relevant log output from ha2: > >> > >>Aug 23 11:42:48 ha2 crmd: [1166]: info: te_rsc_command: Initiating > >>action 16: stop mysqld_stop_0 on ha2 (local) > >>Aug 23 11:42:48 ha2 crmd: [1166]: info: do_lrm_rsc_op: Performing > >>key=16:1:0:ec1989a8-ff84-4fc5-9f48-88e9b285797c op=mysqld_stop_0 ) > >>Aug 23 11:42:48 ha2 lrmd: [1163]: info: rsc:mysqld:10: stop > >>Aug 23 11:42:50 ha2 lrmd: [1163]: info: RA output: > >>(mysqld:stop:stdout) > >>Stopping MySQL daemon: mysqld_safe. > >>Aug 23 11:42:50 ha2 crmd: [1166]: info: process_lrm_event: LRM > >>operation > >>mysqld_stop_0 (call=10, rc=0, cib-update=57, confirmed=true) ok > >>Aug 23 11:42:50 ha2 crmd: [1166]: info: match_graph_event: Action > >>mysqld_stop_0 (16) confirmed on ha2 (rc=0) > >> > >>And here's the relevant log output from ha1: > >> > >>Aug 23 11:42:47 ha1 crmd: [1243]: info: do_lrm_rsc_op: Performing > >>key=8:1:7:ec1989a8-ff84-4fc5-9f48-88e9b285797c op=mysqld_monitor_0 ) > >>Aug 23 11:42:47 ha1 lrmd: [1240]: info: rsc:mysqld:5: probe > >>Aug 23 11:42:47 ha1 crmd: [1243]: info: process_lrm_event: LRM > >>operation > >>mysqld_monitor_0 (call=5, rc=7, cib-update=10, confirmed=true) not > >>running > >>Aug 23 11:43:36 ha1 crmd: [1243]: info: do_lrm_rsc_op: Performing > >>key=11:3:0:ec1989a8-ff84-4fc5-9f48-88e9b285797c op=mysqld_start_0 ) > >>Aug 23 11:43:36 ha1 lrmd: [1240]: info: rsc:mysqld:11: start > >>Aug 23 11:43:36 ha1 lrmd: [1240]: info: RA output: > >>(mysqld:start:stdout) > >>Starting MySQL daemon: mysqld_safe.#012(See > >>/usr/local/mysql/data/mysql.messages for messages). > >>Aug 23 11:43:36 ha1 crmd: [1243]: info: process_lrm_event: LRM > >>operation > >>mysqld_start_0 (call=11, rc=0, cib-update=18, confirmed=true) ok > >> > >>So, ha2 stopped mysqld at 11:42:50, but ha1 didn't start mysqld until > >>11:43:36, a full 46 seconds after it was stopped on ha2. Any ideas > >>why > >>the delay for mysqld was so long, when the MysqlIP resource moved > >>almost > >>instantly? > >Couple thoughts. > > > >Are you sure both servers have the same time (in sync)? > > Yep. They're both using NTP. > > >On HA2 did verify mysqld was actually done stopping at the 11:42:50 mark? > >I don't use mysql so I can't say from experience. > > Yes, I kept checking (with "ps -ef | grep mysqld") every few > seconds, and it stopped running around that time. As soon as it > stopped running on ha2, I started checking on ha1 and it was quite a > while before mysqld started. I knew it was at least 30 seconds, and > I believe it was actually 42 seconds as the logs indicate. > > >Just curious but do you really want it to failback if it's actively running > >on ha2? > > Interesting point. I had just assumed that it was good practice to > have a preferred node for a service, but I guess it doesn't matter. > If I don't care which node the services run on, do I just remove the > location constraints for the "mysql-resources" group altogether? > > >Could you include the output of '$crm configure show' next time? I read > >that much better/quicker than the xml pacemaker config :-) > > > >Jake > > Thanks so much for all of your help, Jake! I'm new to all of this, > and I really appreciate it. > > Here's the requested output: > > root@ha1:~# crm configure show > node $id="1b48f410-44d1-4e89-8b52-ff23b32db1bc" ha1 > node $id="9790fe6e-67b2-4817-abf4-966b5aa6948c" ha2 > primitive MysqlIP ocf:heartbeat:IPaddr2 \ > params ip="192.168.25.9" cidr_netmask="32" \ > op monitor interval="10s" > primitive mysqld lsb:mysqld > primitive stonith-ha1 stonith:external/riloe \ > params hostlist="ha1" ilo_hostname="10.0.1.111" > ilo_user="Administrator" ilo_password="XXXXXXXX" ilo_can_reset="1" > ilo_protocol="2.0" ilo_powerdown_method="button" > primitive stonith-ha2 stonith:external/riloe \ > params hostlist="ha2" ilo_hostname="10.0.1.112" > ilo_user="Administrator" ilo_password="XXXXXXXX" ilo_can_reset="1" > ilo_protocol="2.0" ilo_powerdown_method="button" > group mysql-resources MysqlIP mysqld > location loc-1 stonith-ha1 inf: ha2 > location loc-2 stonith-ha2 inf: ha1
loc-1 and loc-2 are superfluous. > location loc-3 stonith-ha1 -inf: ha1 > location loc-4 stonith-ha2 -inf: ha2 > location loc-5 mysql-resources 200: ha1 > location loc-6 mysql-resources 0: ha2 loc-6 is noop. Thanks, Dejan > property $id="cib-bootstrap-options" \ > dc-version="1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b" \ > cluster-infrastructure="Heartbeat" \ > stonith-enabled="true" > > Also, I verified that STONITH is working. I unplugged the network > cable on ha1 when the virtual IP and mysqld were running. ha2 > promptly took over the services and used STONITH to shut down ha1 > via iLO. So, that part works and flawlessly. There was once again > a delay between the mysqld shutdown on ha2 and startup on ha1 after > I brought ha1 back online, though. Not as bad as before, about 25 > seconds this time. It seems that the delay only occurs when the > non-preferred node relinquishes control of the resources back to > their preferred node following a failover. If I stop preferring one > node for the services, this might not be an issue any longer. > > - Dave > > >_______________________________________________ > >Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > >http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > > >Project Home: http://www.clusterlabs.org > >Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > >Bugs: http://bugs.clusterlabs.org > > -- > > Dave Parker > Systems Administrator > Utica College > Integrated Information Technology Services > (315) 792-3229 > Registered Linux User #408177 > > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org