----- Original Message ----- > From: "Phil Frost" <p...@macprofessionals.com> > To: "The Pacemaker cluster resource manager" <pacemaker@oss.clusterlabs.org> > Sent: Monday, June 18, 2012 8:39:48 AM > Subject: [Pacemaker] resources not migrating when some are not runnable on > one node, maybe because of groups or > master/slave clones? > > I'm attempting to configure an NFS cluster, and I've observed that > under > some failure conditions, resources that depend on a failed resource > simply stop, and no migration to another node is attempted, even > though > a manual migration demonstrates the other node can run all resources, > and the resources will remain on the good node even after the > migration > constraint is removed. > > I was able to reduce the configuration to this: > > node storage01 > node storage02 > primitive drbd_nfsexports ocf:pacemaker:Stateful > primitive fs_test ocf:pacemaker:Dummy > primitive vg_nfsexports ocf:pacemaker:Dummy > group test fs_test > ms drbd_nfsexports_ms drbd_nfsexports \ > meta master-max="1" master-node-max="1" \ > clone-max="2" clone-node-max="1" \ > notify="true" target-role="Started" > location l fs_test -inf: storage02 > colocation colo_drbd_master inf: ( test ) ( vg_nfsexports ) ( > drbd_nfsexports_ms:Master ) > property $id="cib-bootstrap-options" \ > no-quorum-policy="ignore" \ > stonith-enabled="false" \ > dc-version="1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff" > \ > cluster-infrastructure="openais" \ > expected-quorum-votes="2" \ > last-lrm-refresh="1339793579" > > The location constraint "l" exists only to demonstrate the problem; I > added it to simulate the NFS server being unrunnable on one node. > > To see the issue I'm experiencing, put storage01 in standby to force > everything on storage02. fs_test will not be able to run. Now bring > storage01, which can satisfy all the constraints, and see that no > migration takes place. Put storage02 in standby, and everything will > migrate to storage01 and start successfully. Take storage02 out of > standby, and the services remain on storage01. This demonstrates that > even though there is a clear "best" solution where all resources can > run, Pacemaker isn't finding it.
Can you attach a crm_report of what happens when you put the two nodes in standby please? Being able to see the xml and how the policy engine evaluates the transitions is helpful. -- Vossel _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org