Re: [Pacemaker] resources not migrating when some are not runnable on one node, maybe because of groups or master/slave clones?

2012-07-30 Thread Phil Frost
On 07/29/2012 11:15 PM, Andrew Beekhof wrote: If I run: tools/crm_simulate -x ~/Dropbox/phil.xml -Ss | grep "promotion score" I see: drbd_exports:1 promotion score on storage02: 110 drbd_exports:0 promotion score on storage01: 6 The 100 coming from one of your rules which says:

Re: [Pacemaker] Complicated dependences between resources and nodes

2012-07-29 Thread Phil Frost
On 07/28/2012 06:46 AM, Antonis Christofides wrote: Hi, short questions: Is it possible to dictate that resource R1 runs on a different node than resource R2? Yes. http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/ch06s04s02.html Is it possible when moving R1 from

Re: [Pacemaker] Help with N+1 configuration

2012-07-27 Thread Phil Frost
On 07/27/2012 11:48 AM, Cal Heldenbrand wrote: Why wouldn't my mem3 failover happen if it timed out stopping the cluster IP? If a stop action fails, pacemaker can't know if the resource is running, not running, or in some other broken state. The cluster is in an unknown state, and there's no

Re: [Pacemaker] Help with N+1 configuration

2012-07-26 Thread Phil Frost
On 07/26/2012 02:16 PM, Cal Heldenbrand wrote: That seems very handy -- and I don't need to specify 3 clones? Once my memcached OCF script reports a downed service, one of them will automatically transition to the current failover node? There are options for the clone on how many instances o

Re: [Pacemaker] Resource fails to stop

2012-07-26 Thread Phil Frost
On 07/26/2012 12:43 PM, Andrew Widdersheim wrote: http://www.clusterlabs.org/doc/en-US/Pacemaker/1.0/html/Pacemaker_Explained/s-failure-migration.html "If STONITH is not enabled, then the cluster has no way to continue and will not try to start the resource elsewhere, but will try to stop it a

Re: [Pacemaker] Help with N+1 configuration

2012-07-26 Thread Phil Frost
On 07/26/2012 12:34 PM, Cal Heldenbrand wrote: Hi everybody, I've read through the Clusters from Scratch document, but it doesn't seem to help me very well with an N+1 (shared hot spare) style cluster setup. My test case, is I have 3 memcache servers. Two are in primary use (hashed 50/50 b

Re: [Pacemaker] [patch] Seeking suggestions for cluster configuration of HA iSCSI target and initiators]

2012-07-16 Thread Phil Frost
On 07/16/2012 01:34 PM, Phil Frost wrote: I've been doing some study of the iscsi RA since my first post, and it seems to me now that the "failure" in the monitor action isn't actually in the monitor action at all. Rather, it appears that for *all* actions, the RA does a &

Re: [Pacemaker] Seeking suggestions for cluster configuration of HA iSCSI target and initiators

2012-07-16 Thread Phil Frost
On 07/16/2012 01:14 PM, Digimer wrote: I've only tested this a little, so please take it as a general suggestion rather than strong advice. I created a two-node cluster, using red hat's high-availability add-on, using DRBD to keep the data replicated between the two "SAN" nodes and tgtd to expor

[Pacemaker] Seeking suggestions for cluster configuration of HA iSCSI target and initiators

2012-07-16 Thread Phil Frost
I'm designing a cluster to run both iSCSI targets and initiators to ultimately provide block devices to virtual machines. I'm considering the case of a target failure, and how to handle that as gracefully as possible. Ideally, IO may be paused until the target recovers, but VMs do not restart o

Re: [Pacemaker] LIO + Pacemaker kernel oops on failover

2012-07-13 Thread Phil Frost
On 07/03/2012 02:38 PM, Phil Frost wrote: It seems there's something about the iSCSI RAs that hit a bug in LIO: http://comments.gmane.org/gmane.linux.scsi.target.devel/1568?set_cite=hide I seem to be hitting the same problem quite reliably whenever I migrate the iSCSI targets in my cl

[Pacemaker] LIO + Pacemaker kernel oops on failover

2012-07-03 Thread Phil Frost
It seems there's something about the iSCSI RAs that hit a bug in LIO: http://comments.gmane.org/gmane.linux.scsi.target.devel/1568?set_cite=hide I seem to be hitting the same problem quite reliably whenever I migrate the iSCSI targets in my cluster. Sounds like the OP was able to reach a suita

Re: [Pacemaker] Confusing semantics of colocation sets

2012-07-02 Thread Phil Frost
On 07/02/2012 12:50 PM, Dejan Muhamedagic wrote: What is being mangled actually? The crm shell does what is possible given the pacemaker RNG schema. It is unfortunate that the design is slightly off, but that cannot be fixed in the crm syntax. I will demonstrate my point by offering a quiz to t

Re: [Pacemaker] resources not migrating when some are not runnable on one node, maybe because of groups or master/slave clones?

2012-06-29 Thread Phil Frost
On 06/28/2012 01:29 PM, David Vossel wrote: I've been looking into multistate resource colocations quite a bit this week. I have a branch I'm working with that may improve this situation for you. If you are feeling brave, test this branch out with your configuration and see if it fairs better

Re: [Pacemaker] Time-based resource stickiness not working cleanly

2012-06-27 Thread Phil Frost
On 06/27/2012 02:33 PM, Velayutham, Prakash wrote: and the cluster works fine, except that when the fenced (STONITHed) node comes back up and joins the cluster, all resources (including the one that is running in its preferred location) gets restarted. This is annoying and I am trying to find

Re: [Pacemaker] Time-based resource stickiness not working cleanly

2012-06-27 Thread Phil Frost
On 06/26/2012 04:33 PM, Velayutham, Prakash wrote: Any idea? Can a resource order constraint be specified depending on a primitive that is part of a clone resource? Is that even supported? Probably not. Usually you'd want to have your constraints reference the clone, not the primitive behind i

Re: [Pacemaker] MS-Resource never gets promoted

2012-06-27 Thread Phil Frost
On 06/27/2012 12:38 PM, Stallmann, Andreas wrote: I let the tomcat script write some quite elaborate debug output, which NEVER shows an attempt to promote the resource. Any ideas? Does your RA call crm_master? Otherwise you will have to include location constraints in your configuration statin

Re: [Pacemaker] Time-based resource stickiness not working cleanly

2012-06-26 Thread Phil Frost
On 06/26/2012 12:59 PM, Velayutham, Prakash wrote: Hi, I have a Corosync (1.3.0-5.6.1) / Pacemaker (1.1.5-5.5.5) cluster where I am using a Time-based rule for resource stickiness (http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/s-rules-cluster-options.html). Ever

Re: [Pacemaker] Collocating resource with a started clone instance

2012-06-26 Thread Phil Frost
On 06/22/2012 05:58 AM, Sergey Tachenov wrote: Why the score for the DBIP is -INFINITY on the srvplan2? The only INF rule in my config is the collocation rule for the postgres group. This sounds not unlike an issue I'm experencing. See here: http://oss.clusterlabs.org/pipermail/pacemaker/2012-

Re: [Pacemaker] resources not migrating when some are not runnable on one node, maybe because of groups or master/slave clones?

2012-06-26 Thread Phil Frost
On 06/22/2012 04:40 AM, Andreas Kurz wrote: I took a look at the cib in case2 and saw this in the status for storage02. > > > > > > > > >storage02 will not give up the drbd master since it has a higher score that storage01. This coupled with

Re: [Pacemaker] "Grouping" of master/slave resources

2012-06-25 Thread Phil Frost
On 06/25/2012 12:14 PM, Stallmann, Andreas wrote: but crm won’t let me put Master/Slave resource into a group. Why is that? I don't think it's possible to specify the state a resource must be in (master) within a group constraint. However, groups are just (modulo bugs) shorthand for order and

Re: [Pacemaker] Two slave nodes, neither will promote to Master

2012-06-25 Thread Phil Frost
On 06/25/2012 11:48 AM, Regendoerp, Achim wrote: As it is currently, both nodes are online and configured, but none are switching to Master. In lack of a DRBD resource, I tried using the Dummy Pacemaker. If that's not the correct RA, please enlighten me on this too. To simulate a DRBD resource,

Re: [Pacemaker] Cannot start VirtualDomain resource after restart

2012-06-20 Thread Phil Frost
On 06/20/2012 01:48 PM, Kadlecsik József wrote: Your crystal ball worked perfectly:-) - it was the memory utilization. I don't know if you already found it, but crm_simulate has option, "-U", to display the utilization calculations, and if you crank up the verbosity (specify -V a couple time

Re: [Pacemaker] Cannot start VirtualDomain resource after restart

2012-06-20 Thread Phil Frost
On 06/20/2012 01:09 PM, Kadlecsik József wrote: On Wed, 20 Jun 2012, Phil Frost wrote: Firstly, I'd try running "crm_simulate -LS -D pacemaker.dot", then viewing the generated pacemaker.dot with graphviz [1] (specifically "dot". It might also be helpful to pass pa

Re: [Pacemaker] Cannot start VirtualDomain resource after restart

2012-06-20 Thread Phil Frost
On 06/20/2012 10:11 AM, emmanuel segura wrote: I don't know but see the fail it's in the operation lx0_monitor_0, so i ask to someone with more experience then me, if pacemaker does a monitor operation before start? I'm just learning Pacemaker myself, so I could be wrong on some points. I don

Re: [Pacemaker] resources not migrating when some are not runnable on one node, maybe because of groups or master/slave clones?

2012-06-19 Thread Phil Frost
On 06/19/2012 04:31 PM, David Vossel wrote: Can you attach a crm_report of what happens when you put the two nodes in standby please? Being able to see the xml and how the policy engine evaluates the transitions is helpful. The resulting reports were a bit big for the list, so I put them in

Re: [Pacemaker] Confusing semantics of colocation sets (stopping resource stops others in colocation / order sets)

2012-06-18 Thread Phil Frost
On 06/15/2012 05:00 PM, Jake Smith wrote: # also creates three sets colocation colo inf: A B C:Master D E # B -> A -> C -> E -> D Yes because C is a stateful resource. When you tested this I assume you used Dummy resource for A,B,D,E and a Stateful resource for primitive_C and created a ma

Re: [Pacemaker] resources not migrating when some are not runnable on one node, maybe because of groups or master/slave clones?

2012-06-18 Thread Phil Frost
On 06/18/2012 10:05 AM, Jake Smith wrote: Why don't you have vg_nfsexports in the group? Not really any point to a group with only one resource... You need an order constraint here too... Pacemaker needs to know in what order to start/stop/promote things. Something like: order ord_drbd_maste

Re: [Pacemaker] resources not migrating when some are not runnable on one node, maybe because of groups or master/slave clones?

2012-06-18 Thread Phil Frost
On 06/18/2012 10:14 AM, Vladislav Bogdanov wrote: Sets (constraints with more then two members) are evaluated in the different order. Try colocation colo_drbd_master inf: ( drbd_nfsexports_ms:Master ) ( vg_nfsexports ) ( test ) I'm sure that's the wrong order. I've put the parens on each resour

[Pacemaker] resources not migrating when some are not runnable on one node, maybe because of groups or master/slave clones?

2012-06-18 Thread Phil Frost
I'm attempting to configure an NFS cluster, and I've observed that under some failure conditions, resources that depend on a failed resource simply stop, and no migration to another node is attempted, even though a manual migration demonstrates the other node can run all resources, and the reso

Re: [Pacemaker] [solved] stopping resource stops others in colocation / order sets

2012-06-15 Thread Phil Frost
On 06/15/2012 11:55 AM, David Vossel wrote: If resC is stopped resource stop resC then drbd_nfsexports is demoted, and resB and resC will stop. Why is that? I'd expect that resC, being listed last in both the colocation and It is the order constraint. Order constraints are symmetrical. I

Re: [Pacemaker] Confusing semantics of colocation sets (stopping resource stops others in colocation / order sets)

2012-06-15 Thread Phil Frost
On 06/14/2012 05:47 PM, Jake Smith wrote: So it should be resC on top of resB on top of DRBD:master. I think of collocation as being written in the reverse order of "order" statement. That's why resources in groups start in the order they are written and collocate in reverse from written order.

[Pacemaker] stopping resource stops others in colocation / order sets

2012-06-14 Thread Phil Frost
I'm sure this is a typical novice question, but I've been dancing around this problem for a day without any real progress, so could use some more experienced eyes. I'm setting up what must be a pretty normal NFS / DRBD / LVM, 2 node, active / passive cluster. Everything works, mostly, but it do

Re: [Pacemaker] Using shadow configurations noninteractively

2012-03-19 Thread Phil Frost
On Mar 19, 2012, at 15:22 , Florian Haas wrote: > On Mon, Mar 19, 2012 at 8:00 PM, Phil Frost wrote: >> I'm attempting to automate my cluster configuration with Puppet. I'm already >> using Puppet to manage the configuration of my Xen domains. I'd like to &

[Pacemaker] Using shadow configurations noninteractively

2012-03-19 Thread Phil Frost
I'm attempting to automate my cluster configuration with Puppet. I'm already using Puppet to manage the configuration of my Xen domains. I'd like to instruct puppet to apply the configuration (via cibadmin) to a shadow config, but I can't find any sure way to do this. The issue is that running "

[Pacemaker] Configuring multicast on Extreme Networks switches

2012-03-17 Thread Phil Frost
I'm experiencing some problems with pacemaker where everything works fine, unless I wait a few minutes after starting corosync. Then, most things I've tried (adding resources, stopping pacemaker...) fail, and syslog is flooded with "retransmit list" errors, several per second. I think the probl