----- Original Message ----- > From: "Phil Frost" <p...@macprofessionals.com> > To: "The Pacemaker cluster resource manager" <pacemaker@oss.clusterlabs.org> > Sent: Monday, June 18, 2012 9:39:48 AM > Subject: [Pacemaker] resources not migrating when some are not runnable on > one node, maybe because of groups or > master/slave clones? > > I'm attempting to configure an NFS cluster, and I've observed that > under > some failure conditions, resources that depend on a failed resource > simply stop, and no migration to another node is attempted, even > though > a manual migration demonstrates the other node can run all resources, > and the resources will remain on the good node even after the > migration > constraint is removed. > > I was able to reduce the configuration to this: > > node storage01 > node storage02 > primitive drbd_nfsexports ocf:pacemaker:Stateful > primitive fs_test ocf:pacemaker:Dummy > primitive vg_nfsexports ocf:pacemaker:Dummy > group test fs_test
Why don't you have vg_nfsexports in the group? Not really any point to a group with only one resource... > ms drbd_nfsexports_ms drbd_nfsexports \ > meta master-max="1" master-node-max="1" \ > clone-max="2" clone-node-max="1" \ > notify="true" target-role="Started" > location l fs_test -inf: storage02 > colocation colo_drbd_master inf: ( test ) ( vg_nfsexports ) ( > drbd_nfsexports_ms:Master ) You need an order constraint here too... Pacemaker needs to know in what order to start/stop/promote things. Something like: order ord_drbd_master_first drbd_nfsexports_ms:promote vg_nfsexports:start test:start > property $id="cib-bootstrap-options" \ > no-quorum-policy="ignore" \ > stonith-enabled="false" \ > dc-version="1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff" > \ > cluster-infrastructure="openais" \ > expected-quorum-votes="2" \ > last-lrm-refresh="1339793579" > > The location constraint "l" exists only to demonstrate the problem; I > added it to simulate the NFS server being unrunnable on one node. > > To see the issue I'm experiencing, put storage01 in standby to force > everything on storage02. fs_test will not be able to run. Now bring > storage01, which can satisfy all the constraints, and see that no > migration takes place. Put storage02 in standby, and everything will > migrate to storage01 and start successfully. Take storage02 out of > standby, and the services remain on storage01. This demonstrates that > even though there is a clear "best" solution where all resources can > run, Pacemaker isn't finding it. > > So far, I've noticed any of the following changes will "fix" the > problem: > > - removing colo_drbd_master > - removing any one resource from colo_drbd_master > - eliminating the group "test" and referencing fs_test directly in > constraints > - using a simple clone instead of a master/slave pair for > drbd_nfsexports_ms > > My current understanding is that if there exists a way to run all > resources, Pacemaker should find it and prefer it. Is that not the > case? > Maybe I need to restructure my colocation constraint somehow? > Obviously > this is a much reduced version of a more complex practical > configuration, so I'm trying to understand the underlying mechanisms > more than just the solution to this particular scenario. Not positive but try with the order statement added. Might clear it up HTH Jake > > In particular, I'm not really sure how I inspect what Pacemaker is > thinking when it places resources. I've tried running crm_simulate > -LRs, > but I'm a little bit unclear on how to interpret the results. In the > output, I do see this: > > drbd_nfsexports:1 promotion score on storage02: 10 > drbd_nfsexports:0 promotion score on storage01: 5 > > those numbers seem to account for the default stickiness of 1 for > master/slave resources, but don't seem to incorporate at all the > colocation constraints. Is that expected? > > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > > _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org