On 06/22/2012 11:58 AM, Sergey Tachenov wrote: > Hi! > > I'm trying to set up a 2-node cluster. I'm new to pacemaker, but > things are getting better and better. However, I am completely at a > loss here. > > I have a cloned tomcat resource, which runs on both nodes and doesn't > really depend on anything (it doesn't use DRBD or anything else of > that sort). But I'm trying to get pacemaker move the cluster IP to > another node in case tomcat fails. Here's the relevant parts of my > config: > > node srvplan1 > node srvplan2 > primitive DBIP ocf:heartbeat:IPaddr2 \ > params ip="1.2.3.4" cidr_netmask="24" \ > op monitor interval="10s" > primitive drbd_pgdrive ocf:linbit:drbd \ > params drbd_resource="pgdrive" \ > op start interval="0" timeout="240" \ > op stop interval="0" timeout="100" > primitive pgdrive_fs ocf:heartbeat:Filesystem \ > params device="/dev/drbd0" directory="/hd2" fstype="ext4" > primitive ping ocf:pacemaker:ping \ > params host_list="193.233.59.2" multiplier="1000" \ > op monitor interval="10" > primitive postgresql ocf:heartbeat:pgsql \ > params pgdata="/hd2/pgsql" \ > op monitor interval="30" timeout="30" depth="0" \ > op start interval="0" timeout="60" \ > op stop interval="0" timeout="60" \ > meta target-role="Started" > primitive tomcat ocf:heartbeat:tomcat \ > params java_home="/usr/lib/jvm/jre" > catalina_home="/usr/share/tomcat" tomcat_user="tomcat" > script_log="/home/tmo/log/tomcat.log" > statusurl="http://127.0.0.1:8080/status/" \ > op start interval="0" timeout="60" \ > op stop interval="0" timeout="120" \ > op monitor interval="30" timeout="30" > group postgres pgdrive_fs DBIP postgresql > ms ms_drbd_pgdrive drbd_pgdrive \ > meta master-max="1" master-node-max="1" clone-max="2" > clone-node-max="1" notify="true" > clone pings ping \ > meta interleave="true" > clone tomcats tomcat \ > meta interleave="true" target-role="Started" > location DBIPcheck DBIP \ > rule $id="DBIPcheck-rule" 10000: defined pingd and pingd gt 0 > location master-prefer-node1 DBIP 50: srvplan1 > colocation DBIP-on-web 1000: DBIP tomcats
try inf: ... 1000: will be not enough .... because DBIP is also part of postgres group and that group must follow the DRBD Master Regards, Andreas -- Need help with Pacemaker? http://www.hastexo.com/now > colocation postgres_on_drbd inf: postgres ms_drbd_pgdrive:Master > order postgres_after_drbd inf: ms_drbd_pgdrive:promote postgres:start > > As you can see, there are three explicit constraints for the DBIP > resource: preferred node (srvplan1, score 50), successful ping (score > 10000) and running tomcat (score 1000). There's also the resource > stickiness set to 100. Implicit constraints include collocation of the > postgres group with the DRBD master instance. > > The ping check works fine: if I unplug the external LAN cable or use > iptables to block pings, everything gets moved to another node. > > Check for tomcat isn't working for some reason, though: > > [root@srvplan1 bin]# crm_mon -1 > ============ > Last updated: Fri Jun 22 10:06:59 2012 > Last change: Fri Jun 22 09:43:16 2012 via cibadmin on srvplan1 > Stack: openais > Current DC: srvplan1 - partition with quorum > Version: 1.1.7-2.fc16-ee0730e13d124c3d58f00016c3376a1de5323cff > 2 Nodes configured, 2 expected votes > 17 Resources configured. > ============ > > Online: [ srvplan1 srvplan2 ] > > Master/Slave Set: ms_drbd_pgdrive [drbd_pgdrive] > Masters: [ srvplan1 ] > Slaves: [ srvplan2 ] > Resource Group: postgres > pgdrive_fs (ocf::heartbeat:Filesystem): Started srvplan1 > DBIP (ocf::heartbeat:IPaddr2): Started srvplan1 > postgresql (ocf::heartbeat:pgsql): Started srvplan1 > Clone Set: pings [ping] > Started: [ srvplan1 srvplan2 ] > Clone Set: tomcats [tomcat] > Started: [ srvplan2 ] > Stopped: [ tomcat:0 ] > > Failed actions: > tomcat:0_start_0 (node=srvplan1, call=37, rc=-2, status=Timed > Out): unknown exec error > > As you can see, tomcat is stopped on srvplan1 (I have deliberately > messed up the startup scripts), but everything else still runs there. > ptest -L -s shows: > > clone_color: ms_drbd_pgdrive allocation score on srvplan1: 10350 > clone_color: ms_drbd_pgdrive allocation score on srvplan2: 10000 > clone_color: drbd_pgdrive:0 allocation score on srvplan1: 10100 > clone_color: drbd_pgdrive:0 allocation score on srvplan2: 0 > clone_color: drbd_pgdrive:1 allocation score on srvplan1: 0 > clone_color: drbd_pgdrive:1 allocation score on srvplan2: 10100 > native_color: drbd_pgdrive:0 allocation score on srvplan1: 10100 > native_color: drbd_pgdrive:0 allocation score on srvplan2: 0 > native_color: drbd_pgdrive:1 allocation score on srvplan1: -INFINITY > native_color: drbd_pgdrive:1 allocation score on srvplan2: 10100 > drbd_pgdrive:0 promotion score on srvplan1: 30700 > drbd_pgdrive:1 promotion score on srvplan2: 30000 > group_color: postgres allocation score on srvplan1: 0 > group_color: postgres allocation score on srvplan2: 0 > group_color: pgdrive_fs allocation score on srvplan1: 100 > group_color: pgdrive_fs allocation score on srvplan2: 0 > group_color: DBIP allocation score on srvplan1: 10150 > group_color: DBIP allocation score on srvplan2: 10000 > group_color: postgresql allocation score on srvplan1: 100 > group_color: postgresql allocation score on srvplan2: 0 > native_color: pgdrive_fs allocation score on srvplan1: 20450 > native_color: pgdrive_fs allocation score on srvplan2: -INFINITY > clone_color: tomcats allocation score on srvplan1: -INFINITY > clone_color: tomcats allocation score on srvplan2: 0 > clone_color: tomcat:0 allocation score on srvplan1: -INFINITY > clone_color: tomcat:0 allocation score on srvplan2: 0 > clone_color: tomcat:1 allocation score on srvplan1: -INFINITY > clone_color: tomcat:1 allocation score on srvplan2: 100 > native_color: tomcat:1 allocation score on srvplan1: -INFINITY > native_color: tomcat:1 allocation score on srvplan2: 100 > native_color: tomcat:0 allocation score on srvplan1: -INFINITY > native_color: tomcat:0 allocation score on srvplan2: -INFINITY > native_color: DBIP allocation score on srvplan1: 9250 > native_color: DBIP allocation score on srvplan2: -INFINITY > native_color: postgresql allocation score on srvplan1: 100 > native_color: postgresql allocation score on srvplan2: -INFINITY > clone_color: pings allocation score on srvplan1: 0 > clone_color: pings allocation score on srvplan2: 0 > clone_color: ping:0 allocation score on srvplan1: 100 > clone_color: ping:0 allocation score on srvplan2: 0 > clone_color: ping:1 allocation score on srvplan1: 0 > clone_color: ping:1 allocation score on srvplan2: 100 > native_color: ping:0 allocation score on srvplan1: 100 > native_color: ping:0 allocation score on srvplan2: 0 > native_color: ping:1 allocation score on srvplan1: -INFINITY > native_color: ping:1 allocation score on srvplan2: 100 > > Why the score for the DBIP is -INFINITY on the srvplan2? The only INF > rule in my config is the collocation rule for the postgres group. I > can surmise that DBIP can't be run on srvplan2 because the DRBD isn't > Master there, but there's nothing preventing it from being promoted, > and this rule doesn't stop the DBIP from being moved in case of ping > failure either. So there must be something else. > > I also don't quite understand why the DBIP score is 9250 on srvplan1. > It should be at least 10000 for the ping, and 250 more for preference > and stickiness. If I migrate the DBIP to srvplan2 manually, the score > is 10200 there, which makes me think that 1000 gets subtracted because > tomcat is stopped on srvplan1. But why? This is a positive rule, not a > negative one. It should just add 1000 if tomcat is running, but > shouldn't subtract anything if it isn't, am I wrong? > > Does this have anything to do with the fact I'm trying to collocate > the IP with a clone? Or am I looking in the wrong direction? > > I tried removing DBIP from the group, and it got moved to another > node. Obviously, everything else was left on the first one. Then I > tried adding a collocation of DBIP with postgres resources (and the > other way around), and if the score of that rules is high enough, the > IP gets moved back, but I never was able to get postgres moved on the > second node (where the IP is) instead. >
signature.asc
Description: OpenPGP digital signature
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org