Re: [Pacemaker] Replacing sbd devices in running cluster

2015-01-30 Thread Lars Marowsky-Bree
On 2015-01-30T08:29:18, emmanuel segura wrote: > from one of two: > > /dev/sdX and /dev/sdY > > sbd -d "/dev/sdX;/dev/sdY" message node1 exit > sbd -d "/dev/sdX;/dev/sdY" message node2 exit > > sbd -d /dev/sdA create && sbd -d /dev/sdB create > > Now in every cluster node > > sbd -d "/dev/sd

Re: [Pacemaker] Problems with SBD

2015-01-07 Thread Lars Marowsky-Bree
On 2015-01-04T19:49:58, Oriol Mula-Valls wrote: > I have a two node system with SLES 11 SP3 (pacemaker-1.1.9-0.19.102, > corosync-1.4.5-0.18.15, sbd-1.1-0.13.153). Since desember we started to > have several reboots of the system due to SBD; 22nd, 24th and 26th. Last > reboot happened yesterday J

Re: [Pacemaker] [ha-wg] [Cluster-devel] [Linux-HA] [RFC] Organizing HA Summit 2015

2014-11-26 Thread Lars Marowsky-Bree
On 2014-11-25T16:46:01, David Vossel wrote: Okay, okay, apparently we have got enough topics to discuss. I'll grumble a bit more about Brno, but let's get the organisation of that thing on track ... Sigh. Always so much work! I'm assuming arrival on the 3rd and departure on the 6th would be the

Re: [Pacemaker] [Linux-HA] [ha-wg] [RFC] Organizing HA Summit 2015

2014-11-25 Thread Lars Marowsky-Bree
On 2014-11-24T16:16:05, "Fabio M. Di Nitto" wrote: > > Yeah, well, devconf.cz is not such an interesting event for those who do > > not wear the fedora ;-) > That would be the perfect opportunity for you to convert users to Suse ;) > >> I´d prefer, at least for this round, to keep dates/location

Re: [Pacemaker] [ha-wg] [RFC] Organizing HA Summit 2015

2014-11-24 Thread Lars Marowsky-Bree
On 2014-11-24T15:54:33, "Fabio M. Di Nitto" wrote: > dates and location were chosen to piggy-back with devconf.cz and allow > people to travel for more than just HA Summit. Yeah, well, devconf.cz is not such an interesting event for those who do not wear the fedora ;-) > I´d prefer, at least fo

Re: [Pacemaker] [ha-wg] [RFC] Organizing HA Summit 2015

2014-11-24 Thread Lars Marowsky-Bree
On 2014-09-08T12:30:23, "Fabio M. Di Nitto" wrote: Folks, Fabio, thanks for organizing this and getting the ball rolling. And again sorry for being late to said game; I was busy elsewhere. However, it seems that the idea for such a HA Summit in Brno/Feb 2015 hasn't exactly fallen on fertile gro

Re: [Pacemaker] supportted arch for pacemaker

2014-09-25 Thread Lars Marowsky-Bree
On 2014-09-25T16:49:38, Gang U Xu wrote: > I searched from google and can only find rpm for rhel with x86 arch > Can pacemaker installed on rhel 6/7 with ppc64 arch and s390x arch? Pacemaker and the entire stack (corosync, OCFS2, libdlm, clvm2 etc) run fine on s390x and the various ppc flavours.

Re: [Pacemaker] SLES 11 SP3 boothd behaviour

2014-08-27 Thread Lars Marowsky-Bree
On 2014-08-27T13:31:21, "Sutherland, Rob" wrote: > [Rob] Already done. It's unfortunate that SuSE only ships version 0.1.0. The newer booth version has a fundamentally different algorithm and network protocol (since it drops Paxos in favor of Raft) and thus isn't a drop-in replacement for SLE

Re: [Pacemaker] clusters on virtualised platforms

2014-07-17 Thread Lars Marowsky-Bree
On 2014-07-17T10:36:01, Nick Cameo wrote: > "Instead, have the HA hypervisor layer protect the VM as a clustered > service" > > I had to read this a couple of times Lars, and it's interesting. If I > understand correctly run the cluster on bare metal, taking care of the > virtual machine instanc

Re: [Pacemaker] clusters on virtualised platforms

2014-07-17 Thread Lars Marowsky-Bree
On 2014-07-17T03:48:51, Alex Samad - Yieldbroker wrote: > I wonder if there Best practise or how to, on how to run clusters on say > VMWare. We've got many customers running SLE HA (pacemaker/corosync) cluster inside virtual machines. That works fine. There are a few obvious caveats. Make sur

Re: [Pacemaker] is_managed meta attribute on suse 10 with heartbeat + pacemaker

2014-07-15 Thread Lars Marowsky-Bree
On 2014-07-15T14:03:00, emmanuel segura wrote: > Thanks Lars for you answer, > > I was using crm_resource, because using crmsh i can't start a resource > on a specific node. crm_resource can't do that either directly; you need to do this in two steps, moving a resource to a node and then starti

Re: [Pacemaker] is_managed meta attribute on suse 10 with heartbeat + pacemaker

2014-07-15 Thread Lars Marowsky-Bree
On 2014-07-15T12:44:49, emmanuel segura wrote: > Hello Lars, > > I saw the same problem on suse 11, this is the example command i found > in the man page for stop a resource > > man crm_resource > Start or stop a resource: Please use the crm shell to start/stop resources. crm res

Re: [Pacemaker] Interested in becoming a cluster developer?

2014-07-15 Thread Lars Marowsky-Bree
On 2014-07-15T17:04:52, Andrew Beekhof wrote: > Enjoy tinkering with clusters and have a background in software development? > There might be some positions working with yours truly at Red Hat soon, drop > me a note if you're interested. > Hitting Reply-All can and will be used against you :-)

Re: [Pacemaker] is_managed meta attribute on suse 10 with heartbeat + pacemaker

2014-07-12 Thread Lars Marowsky-Bree
On 2014-07-10T21:57:24, emmanuel segura wrote: > I know heartbeat is deprecated, but we have an old cluster, and today > i tryed to disable the cluster monitoring for maintance on a resource > using the following command "crm_resource -r myresource -t primitive > -p is_managed -v off", but after

Re: [Pacemaker] Alternative communication engine to corosync (etcd/consul/zookeeper/doozerd)

2014-06-21 Thread Lars Marowsky-Bree
On 2014-06-20T11:32:58, Patrick Hemmer wrote: > > I rather doubt it. k/v stores and CPG are not very alike from where I'm > > sitting. > No, they are not alike, but you could implement something looking like CPG. > When a key is created, that's a CPG message. They support atomic > operations (c

Re: [Pacemaker] SOLVED: node is unable to join cluster after upgrade (crmd dies)

2014-06-19 Thread Lars Marowsky-Bree
On 2014-06-18T21:49:51, "Krause, Markus" wrote: > as I noticed that some permissions on the first node (sql01a) looked „weird“ > I decided to reinstall the whole system on this node and after copying all > the configuration from the second host everything is working as expected > without furth

Re: [Pacemaker] [Fuel][HA] Notifying clones of offline nodes

2014-05-27 Thread Lars Marowsky-Bree
On 2014-05-27T10:02:44, Andrew Beekhof wrote: > > We are working on HA solutions for OpenStack(-related) services and figured > > out that sometimes we need clones to be notified if one of the cluster > > nodes running clone instances goes offline. E.g., we need this information > > to make Ra

Re: [Pacemaker] failed actions are not removed

2014-04-01 Thread Lars Marowsky-Bree
On 2014-04-01T14:41:11, Attila Megyeri wrote: > Hi Andrew, all, > > We use Pacemeaker 1.1.10, with corosync 2.2.3 and we notice that failed > actions are not reset after the cluster recheck interval has elapsed. > Is this a known issue, or shall I provide some more details? What have you set f

Re: [Pacemaker] How to avoid dual fencing ?

2014-03-26 Thread Lars Marowsky-Bree
On 2014-03-25T02:39:27, Digimer wrote: > Of course, "unlikely" is rarely good enough in the HA world. So I am glad > you disabled acpid to be safe. :) "Unlikely" is all there is. There is no guarantee on anything except statistical (un)likelihood. ;-) Regards, Lars -- Architect Storage/

Re: [Pacemaker] 2-node cluster with shared storage: what is current solution

2014-03-19 Thread Lars Marowsky-Bree
On 2014-03-19T19:20:35, Саша Александров wrote: > Now, we got shared storage over multipath FC there, so we need to move from > drbd to shared storage. And I got totally confused now - I can not find a > guide on how to set things up. I see two options: > - use gfs2 > - use ext4 with sbd If you

Re: [Pacemaker] fencing question

2014-03-12 Thread Lars Marowsky-Bree
On 2014-03-12T16:16:54, Karl Rößmann wrote: > >>primitive fkflmw ocf:heartbeat:Xen \ > >>meta target-role="Started" is-managed="true" allow-migrate="true" \ > >>op monitor interval="10" timeout="30" \ > >>op migrate_from interval="0" timeout="600" \ > >>op migrate_

Re: [Pacemaker] fencing question

2014-03-12 Thread Lars Marowsky-Bree
On 2014-03-12T15:17:13, Karl Rößmann wrote: > Hi, > > we have a two node HA cluster using SuSE SlES 11 HA Extension SP3, > latest release value. > A resource (xen) was manually stopped, the shutdown_timeout is 120s > but after 60s the node was fenced and shut down by the other node. > > should

Re: [Pacemaker] Pacemaker/corosync freeze

2014-03-09 Thread Lars Marowsky-Bree
On 2014-03-07T09:08:41, Attila Megyeri wrote: > One more thing to add. I did an apt-get upgrade on one of the nodes, and then > restarted the node. It resulted in this state on all other nodes again... 2.3.0 is not the most recent corosync version. 2.3.3 (and possibly the git tree) contain quit

Re: [Pacemaker] Hawk session ends after start or stop action

2014-03-04 Thread Lars Marowsky-Bree
On 2014-03-03T15:48:07, "Schaefer, Diane E" wrote: Hi Diane, > I am running pacemaker on SLES 11 SP3 and have applied the update package > released in December. The hawk level is 0.6.1-0.11.1 and lighttpd is > 1.4.20-2.52.1 . When I log into hawk using firefox, google chrome or IE 9 all >

Re: [Pacemaker] cold-start to standby?

2014-03-01 Thread Lars Marowsky-Bree
On 2014-03-01T00:14:25, Matthew O'Connor wrote: > I have had a few instances recently where circumstances conspired to > bring my cluster down completely and most non-gracefully (and this was > in spite of a relatively new 10kVA UPS). When bringing the nodes back > online, it would be enormously

Re: [Pacemaker] crm_gui and its dependencies

2014-02-24 Thread Lars Marowsky-Bree
On 2014-02-24T10:21:29, Jan Bicek wrote: > Hi, i would like to ask, if it is necessary to crm_gui have all this > dependencies. I would appreciate that crm_gui would be independent piece > of sw, so i could install only it to my notebook a use it to manage a > few clusters. Would it be possible?

Re: [Pacemaker] Migrating resources on custom conditions

2014-02-21 Thread Lars Marowsky-Bree
On 2014-02-21T13:02:23, Vladislav Bogdanov wrote: > It could be nice feature to have kind of general SLA concept (it could > be very similar to the utilization one from the resource configuration > perspective), so resources try to move or live migrate out of nodes > which have SLA attributes bel

Re: [Pacemaker] possible regex error in "pcs resource enable/disable"

2014-02-20 Thread Lars Marowsky-Bree
On 2014-02-20T16:03:36, Bob Haxo wrote: > Sooo, seems that we need to kick this over the fence to the crmsh > folks ... or the SUSE folks, if they are maintaining crmsh. Yes, please file a bug. This shouldn't happen (I thought we had exorcized this class of bugs.) https://savannah.nongnu.org/bu

Re: [Pacemaker] possible regex error in "pcs resource enable/disable"

2014-02-20 Thread Lars Marowsky-Bree
On 2014-02-19T14:39:30, Bob Haxo wrote: > Chris, was easy to duplicate ... I thought that I had cleared > the error, but that had not happened. > > Bob Haxo > > [root@mici-admin ~]# pcs resource disable virt > [root@mici-admin ~]# pcs resource disable libvirtd-clone > Error: Error performing o

Re: [Pacemaker] lrmd fork: cannot allocate memory

2014-02-13 Thread Lars Marowsky-Bree
On 2014-02-13T11:45:07, walter.pis...@erptech.it wrote: > Thanks Lars, > > I understand that i have an out-of-memory situation, but I don't' understand > because swap memory is 100% free (128Gb). Hard to say without knowing what leaked the memory. If it's a process that locks itself into RAM, t

Re: [Pacemaker] lrmd fork: cannot allocate memory

2014-02-12 Thread Lars Marowsky-Bree
On 2014-02-12T14:56:09, walter.pis...@erptech.it wrote: > This is sar on node 1 > > > > 11:10:01 958244 131164608 99,27 2282996 115595888 87761480 > 32,95 > 11:20:01 903164 131219688 99,32 2289980 115604020 87799716 > 32,97 > 11:30:01 1101560 131021292

Re: [Pacemaker] lrmd fork: cannot allocate memory

2014-02-12 Thread Lars Marowsky-Bree
On 2014-02-12T11:57:16, walter.pis...@erptech.it wrote: This is still 1.1.7 with the LRM from cluster-glue. All the log messages point to, well, an out-of-memory error on that node. > Can this error "Cannot allocate memory" to indicate that there cannot be any > memory allocated for a new Resou

Re: [Pacemaker] pacemaker. config safe and create a new cluster?

2014-02-11 Thread Lars Marowsky-Bree
On 2014-02-11T11:38:15, Beo Banks wrote: > can i use this configuation to create a new cluster system? > maybe with crm configure safe > whatever-bak > change the hostname/ip in whatever-bak > copy the file to the new cluster system install all services > and then crm configure load whatever-bak

Re: [Pacemaker] Stop vs orphan stop using notify

2014-01-30 Thread Lars Marowsky-Bree
On 2014-01-29T21:42:52, Yogesh Patil wrote: > I am creating a clone of a group. Although clone-max is 2, it creates > resources on all nodes in the cluster and then convert all but 2 to orphan > stop state(crm_mon output). I want to ommit such state but all I get is > stopped resource. This seem

Re: [Pacemaker] Stop vs orphan stop using notify

2014-01-29 Thread Lars Marowsky-Bree
On 2014-01-29T14:00:38, Yogesh Patil wrote: > I am doing some destructive actions on stopped state. But I don't want to > do them it is ORPHAN stop. How can I differentiate in CRM_notify_* values > coming from environment variables. What do you mean by "orphan stop"? And how do your services en

Re: [Pacemaker] Having a really hard time with clvmd on RHEL 7 beta

2014-01-27 Thread Lars Marowsky-Bree
On 2014-01-27T13:15:23, Digimer wrote: > I try to configure clvmd this way: > > > pcs cluster cib clvmd_cfg > pcs -f clvmd_cfg resource create clvmd lsb:clvmd params daemon_timeout=30s > op monitor interval=60s Hmmm. Something is not matching up here. "lsb" resources can't take parameters,

Re: [Pacemaker] Announce: SNMP agent for pacemaker

2014-01-22 Thread Lars Marowsky-Bree
On 2014-01-22T09:37:33, Michael Schwartzkopff wrote: > Hi, > > I am working on a SNMP agent for pacemaker. it is written in perl. At the > moment it is in an alpha stadium. > > Any volunteers for testing? I'd be quite curious to learn more about this for sure. Also about the choice to write

Re: [Pacemaker] command to dump cluster configuration in "pcs" format?

2014-01-16 Thread Lars Marowsky-Bree
On 2014-01-17T07:40:34, Andrew Beekhof wrote: > > Well, unless RHT states that installing crmsh on top of their > > distribution invalidates support for the pacemaker back-end, you could > > just ship crmsh as part of your product on that platform. > Thats not how RHT operates I'm afraid. If som

Re: [Pacemaker] command to dump cluster configuration in "pcs" format?

2014-01-16 Thread Lars Marowsky-Bree
On 2014-01-16T09:21:33, Bob Haxo wrote: > > Curious if you can push these upstream too ;-) (Or already have.) > I'll report the issue and I'll include my hack. But it is a hack. I > know that I do not have a general solution. Thanks! > Yes, I install crmsh for development and will ship with t

Re: [Pacemaker] command to dump cluster configuration in "pcs" format?

2014-01-16 Thread Lars Marowsky-Bree
On 2014-01-15T20:25:30, Bob Haxo wrote: > Unfortunately, it configuration has taken me weeks to develop (what now > seems to be) a working configuration (including mods to the > VirtualDomain agent to avoid spurious restarts of the VM). Curious if you can push these upstream too ;-) (Or already

Re: [Pacemaker] How to configure hearbeat using private network?

2014-01-13 Thread Lars Marowsky-Bree
On 2014-01-12T18:53:50, John Wei wrote: > I believe corosync does support this. Can someone point me to the document > on how to do this. Just configure corosync to use the private network interface via the bindnetaddr. -- Architect Storage/HA SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennife

Re: [Pacemaker] possible node status

2014-01-09 Thread Lars Marowsky-Bree
On 2014-01-08T23:31:03, Michael Schwartzkopff wrote: > > What part of the stack are you asking about? Those that can show in the > > CIB, that crm_mon shows, that pengine computes ...? > That in the CIB and/or those that crm_mon reports. Basically that states that > the monitoring would like to

Re: [Pacemaker] possible node status

2014-01-08 Thread Lars Marowsky-Bree
On 2014-01-08T22:30:29, Michael Schwartzkopff wrote: > Hi, > > what are fro mthe pacemaker point of view all possible node status'? > > online|standby|offline|... > > Any future extensions planned? What part of the stack are you asking about? Those that can show in the CIB, that crm_mon shows

Re: [Pacemaker] Howto check if the current node is active?

2014-01-07 Thread Lars Marowsky-Bree
On 2014-01-07T12:33:01, "Bauer, Stefan (IZLBW Extern)" wrote: > Hi Folks! > > How can i check if the current node i'm connected to is the active? > It should be parseable because i want to use it in a script. What do you mean with "active"? And on what platform? "systemctl status pacemaker" w

Re: [Pacemaker] constraint colocation or resource group

2014-01-02 Thread Lars Marowsky-Bree
On 2014-01-02T11:22:01, Luc Paulin wrote: > That make sense to use the colocation. So I guess that I should define a > "master" resource and tell each other resource that they should colocated > on the same node at the "master" resource. You could also use a resource set to achieve this, but I d

Re: [Pacemaker] Show in-process operations?

2013-12-26 Thread Lars Marowsky-Bree
On 2013-12-26T12:07:21, Patrick Hemmer wrote: > Currently operations only show up when they've completed (crm_mon -o). > At glance, it looks like this is because the CIB doesn't update util the > operation is complete (and doesn't list monitor operations at all unless > they've failed). But is th

Re: [Pacemaker] crmsh: New syntax for location constraints, suggestions / comments

2013-12-13 Thread Lars Marowsky-Bree
On 2013-12-14T01:11:17, Vladislav Bogdanov wrote: > > The idea was to offer an additional construct that provides both > > properties, since *most of the time*, that's what users want. In the > > interest of clarity and brevity in the configuration, this would be > > quite useful. > group? group

Re: [Pacemaker] What pacemaker doc to get and where to get software

2013-12-13 Thread Lars Marowsky-Bree
On 2013-12-13T19:32:53, "John Wei (John, RDCC)" wrote: > I am confused on where and what to download pacemaker's doc and software > > On the documentation: > I am running SUSE, per http://clusterlabs.org/doc/ Hi John, which version of SUSE are you running? For openSUSE, you can find packages i

Re: [Pacemaker] crmsh: New syntax for location constraints, suggestions / comments

2013-12-13 Thread Lars Marowsky-Bree
On 2013-12-13T13:11:30, Rainer Brestan wrote: > Please do not merge colocation and order together in a way that only none or > both is present. This was never the plan. The idea was to offer an additional construct that provides both properties, since *most of the time*, that's what users want.

Re: [Pacemaker] crmsh: New syntax for location constraints, suggestions / comments

2013-12-13 Thread Lars Marowsky-Bree
On 2013-12-13T11:46:05, Kristoffer Grönlund wrote: > This worries me as well, however the current syntax for constraints is > confusing and error-prone. Right. At least the { } would make it clear to users that it's now a resource set and not merely more than 2 in the same sequence. > It would

Re: [Pacemaker] crmsh: New syntax for location constraints, suggestions / comments

2013-12-13 Thread Lars Marowsky-Bree
On 2013-12-13T13:51:27, Andrey Groshev wrote: > Just thought that I was missing in "location", something like: node=any :) Can you describe what this is supposed to achieve? "any" is the default for symmetric clusters anyway. Regards, Lars -- Architect Storage/HA SUSE LINUX Products Gmb

Re: [Pacemaker] crmsh: New syntax for location constraints, suggestions / comments

2013-12-13 Thread Lars Marowsky-Bree
On 2013-12-13T10:16:41, Kristoffer Grönlund wrote: > Lars (lmb) suggested that we might switch to using the { } - brackets > around resource sets everywhere for consistency. My only concern with > that would be that it would be a breaking change to the previous crmsh > syntax. Maybe that is okay

Re: [Pacemaker] monitor on-fail=ignore not restarting when resource reported as stopped

2013-12-09 Thread Lars Marowsky-Bree
On 2013-12-06T16:06:09, Patrick Hemmer wrote: Hi Patrick, > > For a resource that pacemaker expects to be started, it's an error if it > > is found to be stopped. Pacemaker can't tell if it is really cleanly > > stopped, or died, or ... > Oh, and I'll quote the OCF spec on this one: > > 1 g

Re: [Pacemaker] Ressources not moving to node with better connectivity - pingd

2013-12-09 Thread Lars Marowsky-Bree
On 2013-12-09T10:28:32, "Bauer, Stefan (IZLBW Extern)" wrote: > location groupwithping cluster1 \ > rule $id="groupwithping-rule" pingd: defined pingd I tend to prefer a -inf score for nodes where pingd is *not* defined or zero. (Downside is that when you lose all connectivity, all ser

Re: [Pacemaker] monitor on-fail=ignore not restarting when resource reported as stopped

2013-12-06 Thread Lars Marowsky-Bree
On 2013-12-06T11:21:02, Patrick Hemmer wrote: > > So where is the problem? If the script returns "ERROR" than pacemaker has > > to > > acct accordingly. > If the script returns "ERROR" the `on-fail=ignore` should make it do > nothing. Amazon's API failed, we need to just retry again later. > If

Re: [Pacemaker] Why Pacemaker automatically creates constraints ?

2013-12-06 Thread Lars Marowsky-Bree
On 2013-12-06T09:54:19, Gaëtan Slongo wrote: > I know this is caused by the "-inf" but I don't explicitly created this > constraint ... Pacemaker did it himself... :-( No, it did this because you *asked it to*. > This constraint is also created when the resource moves automatically. No. This i

Re: [Pacemaker] Pacemaker 1.1.10 and pacemaker-remote

2013-12-06 Thread Lars Marowsky-Bree
On 2013-12-06T08:55:47, Vladislav Bogdanov wrote: > BTW, pacemaker cib accepts any meta attributes (and that is very > convenient way for me to store some 'meta' information), while crmsh > limits them to a pre-defined list. While that is probably fine for > novices, that limits some advanced usa

Re: [Pacemaker] Why Pacemaker automatically creates constraints ?

2013-12-06 Thread Lars Marowsky-Bree
On 2013-12-06T09:00:32, Gaëtan Slongo wrote: > OK I understand, but this makes troubles for me... Example: When the > node holding the resource (and the constraint) reboots the resource is > not moving to the other node (because of this constraint, I see on the > debug logs no node can hold the r

Re: [Pacemaker] Pacemaker 1.1.10 and pacemaker-remote

2013-12-05 Thread Lars Marowsky-Bree
On 2013-12-05T10:10:08, James Oakley wrote: > That's why I used crm_resource to add it. If I dump the config with cibadmin, > it looks consistent with the example here: > > http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html-single/Pacemaker_Remote/#_mile_high_view_of_configuration_steps Right

Re: [Pacemaker] Pacemaker 1.1.10 and pacemaker-remote

2013-12-05 Thread Lars Marowsky-Bree
On 2013-12-05T08:57:48, James Oakley wrote: > Unfortunately, crm does not let me add it. I tried forcing it using > crm_resource, but it's not working. It shows up now: > > primitive lxc_db0 @lxc \ > params container="db0" config="/var/lib/lxc/db0/config" \ > meta remote-node="d

Re: [Pacemaker] Why Pacemaker automatically creates constraints ?

2013-12-05 Thread Lars Marowsky-Bree
On 2013-12-05T17:12:01, Gaëtan Slongo wrote: > Hi ! > > I'm trying to configure a cluster using pacemaker 1.1 and corosync since > 2 days (on Debian wheezy). Many of my current problems are related to > the contraint creation. When I manually move a resource (or when a > failback occurs) this ki

Re: [Pacemaker] catch-22: can't fence node A because node A has the fencing resource

2013-12-04 Thread Lars Marowsky-Bree
On 2013-12-03T19:47:41, "Brian J. Murrell" wrote: > So given all of the above, and given the log I supplied showing that the > fencing was just not being attempted anywhere other than the node to be > fenced (which was down during that log) any clues as to where to look > for why? As far as I sa

Re: [Pacemaker] Where the heck is Beekhof?

2013-11-28 Thread Lars Marowsky-Bree
On 2013-11-28T12:04:01, Andrew Beekhof wrote: > If you find yourself asking $subject at some point in the next couple of > months, the answer is that I'm taking leave to look after our new son (Lawson > Tiberius Beekhof) who was born on Tuesday. Congratulations! And all the best to you and you

Re: [Pacemaker] howto group resources without having an order

2013-11-26 Thread Lars Marowsky-Bree
On 2013-11-26T09:19:36, "Bauer, Stefan (IZLBW Extern)" wrote: > Hi, > > thank you for your input - unfortunately i want to go another path if > possible to not not have to change more parts of my configuration: > > I have setup so far: > > group cluster1 p_eth0 p_conntrackd > location groupw

Re: [Pacemaker] p_mysql peration monitor failed 'not installed'

2013-11-22 Thread Lars Marowsky-Bree
On 2013-11-21T22:32:34, Miha wrote: > HI, > > what could be a reason for this error: > > notice: unpack_rsc_op: Preventing p_mysql from re-starting > on sip2: operation monitor failed 'not installed' (rc=5) > > > p_mysql_monitor_0 on sip2 'not installed' (5): call=22, > status=complete, las

Re: [Pacemaker] Need to relax corosync due to backup of VM through snapshot

2013-11-21 Thread Lars Marowsky-Bree
On 2013-11-20T16:58:01, Gianluca Cecchi wrote: > Based on docs I thought that the timeout should be > > token x token_retransmits_before_loss_const No, the comments in the corosync.conf.example and man corosync.conf should be pretty clear, I hope. Can you recommend which phrasing we should imp

Re: [Pacemaker] pacemaker update crash my config (cannot be represented in the CLI notation)

2013-11-20 Thread Lars Marowsky-Bree
On 2013-11-20T16:43:51, Beo Banks wrote: > INFO: object cli-prefer-mysql cannot be represented in the CLI notation > > > crm configure show | grep xml > INFO: object cli-prefer-mysql cannot be represented in the CLI notation > xml rsc="mysql" score="INFINITY"/> This does not mean your configu

Re: [Pacemaker] stonith ra class missing

2013-11-20 Thread Lars Marowsky-Bree
On 2013-11-20T11:20:45, Michael Schwartzkopff wrote: > I removed the pacemaker installation 1.1.9 from the opensuse build > server and installed the 1.1.10 from the RHEL-HA repository. now > everything is working as expected. > > Besides some kernel panics, that are not related to the cluster >

Re: [Pacemaker] some questions about STONITH

2013-11-20 Thread Lars Marowsky-Bree
On 2013-11-20T09:45:54, Andrey Groshev wrote: > > A "fence" request is executed when a node is deemed to be in an > > untrustworthy state - when a stop has failed, or when a network error > > occurs. Note that in the last case, login via ssh is obviously no longer > > possible at all. > In last c

Re: [Pacemaker] some questions about STONITH

2013-11-20 Thread Lars Marowsky-Bree
On 2013-11-19T19:20:43, "Masopust, Christian" wrote: > at this point I'd like to jump in as I'm completely new to fencing :) > > My question is: which node exactly does the fencing? One of the nodes that remain in the quorate partition. Regards, Lars -- Architect Storage/HA SUSE LINUX

Re: [Pacemaker] some questions about STONITH

2013-11-19 Thread Lars Marowsky-Bree
On 2013-11-19T23:06:04, Andrey Groshev wrote: > > First, like digimer wrote, clearly stonith-by-ssh is useless for > > production since you can't fence nodes that are having problems. But for > > testing, it's worth a try. > Maybe I do not quite understand correctly the term "fence" A "fence" re

Re: [Pacemaker] some questions about STONITH

2013-11-19 Thread Lars Marowsky-Bree
On 2013-11-19T22:10:29, Andrey Groshev wrote: First, like digimer wrote, clearly stonith-by-ssh is useless for production since you can't fence nodes that are having problems. But for testing, it's worth a try. Note that cluster-glue actually does include an external/ssh script. You're reinventi

Re: [Pacemaker] SBD fencing with stonith disabled

2013-11-19 Thread Lars Marowsky-Bree
On 2013-11-19T12:02:07, "Angel L. Mateo" wrote: > >Yes, you should also be able to use that. > But is it recommended for a two node cluster? I remember me reading in > some > place that in such scenario is better a sbd stonith because it provides > mechanism (but I could be wrong) True, e

Re: [Pacemaker] SBD fencing with stonith disabled

2013-11-19 Thread Lars Marowsky-Bree
On 2013-11-19T11:25:36, "Angel L. Mateo" wrote: > >>property $id="cib-bootstrap-options" \ > >> dc-version="1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c" \ > >Wow, that's quite old. > It's pacemaker provided by ubuntu 12.04. Yeah, well. Still old. Probably something to complain t

Re: [Pacemaker] SBD fencing with stonith disabled

2013-11-19 Thread Lars Marowsky-Bree
On 2013-11-19T09:27:10, "Angel L. Mateo" wrote: > property $id="cib-bootstrap-options" \ > dc-version="1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c" \ Wow, that's quite old. > Nov 16 12:20:47 myotis51 sbd: [1377]: WARN: Latency: No liveness for 4 s > exceeds threshold of 3 s ( > healt

Re: [Pacemaker] configuration files of cluster

2013-11-18 Thread Lars Marowsky-Bree
On 2013-11-18T15:58:38, Dvorak Andreas wrote: > With pca config I get a nice output, but I would like to have that in one or > more files. "crm configure save" does what you want. Or backup /var/lib/pacemaker. Regards, Lars -- Architect Storage/HA SUSE LINUX Products GmbH, GF: Jeff Haw

Re: [Pacemaker] route for ip resource missing

2013-11-14 Thread Lars Marowsky-Bree
On 2013-11-14T15:54:00, Dvorak Andreas wrote: > Dear all, > > I have set up an ip resource with > pcs resource create ClusterIP ocf:heartbeat:IPaddr2 ip=10.40.70.38 > cidr_netmask=32 nic=eth2 op monitor interval=30s > but the route for eth2 is missing. I am able to switch the resource to that

Re: [Pacemaker] recover cib from raw file

2013-11-12 Thread Lars Marowsky-Bree
On 2013-11-12T09:51:02, "s.oreilly" wrote: > Brilliant, thanks Andrew. I was looking for a pcs option. Should have thought > about cibadmin. Hopefully I will never break things badly enough to have to > use > it :-) crm configure load xml ... ;-) Regards, Lars -- Architect Storage/HA SU

Re: [Pacemaker] resources does not start on survied node after reboot

2013-10-31 Thread Lars Marowsky-Bree
On 2013-10-29T18:12:51, Саша Александров wrote: > Oct 29 13:04:21 wcs2 pengine[2362]: warning: stage6: Scheduling Node wcs1 > for STONITH > Oct 29 13:04:21 wcs2 crmd[2363]: notice: te_fence_node: Executing reboot > fencing operation (53) on wcs1 (timeout=6) > Oct 29 13:05:33 wcs2 stonith-n

Re: [Pacemaker] IPaddr2 started but "not working"

2013-10-28 Thread Lars Marowsky-Bree
On 2013-10-28T09:38:34, Francesco Namuri wrote: > Hi, > I've a problem with a IPaddr2 resource, it appears to be started > correctly and it works for a while, but after it appears "to be hanged", > it doesn't work until I send it a restart. > Does anyone has experienced similar issue? You're run

Re: [Pacemaker] stonith resource "action" parameter ignored?

2013-10-18 Thread Lars Marowsky-Bree
On 2013-10-18T11:26:52, Nikola Ciprich wrote: > I'm using pacemaker-1.1.8 (RHEL6), fence-agents-4.0.3 (I compiled this myself > in order to use new netio stonith plugin), corosync-1.4.1, kernel-3.10.11 (in > case this would be important) Unless 1.1.8-rhel has some fixes backported, I think you'd

Re: [Pacemaker] stonith resource "action" parameter ignored?

2013-10-18 Thread Lars Marowsky-Bree
On 2013-10-18T11:04:26, Nikola Ciprich wrote: > Hi, > > I'm still trying to get fencing working with dual power supply / node > and either I'm blind to some dumb mistake of mine, or there's some > nasty pacemaker bug.. To anticipate the next obvious question, what's your pacemaker version? Re

Re: [Pacemaker] Stoping clone on one node

2013-10-17 Thread Lars Marowsky-Bree
On 2013-10-17T11:36:51, Andreas Mock wrote: > The only thing which comes to my mind is > creating a -inf location contraint temporarily. Yes, exactly that. I *think* you might try (in the XML, I'm pretty sure neither crmsh nor pcs can express that) to set the meta_attribute target-role in a nod

Re: [Pacemaker] Service restoration in clone resource group

2013-10-16 Thread Lars Marowsky-Bree
On 2013-10-16T09:21:47, Andrew Beekhof wrote: > The clone was created using the interleave=true option, yes. > You might want to trawl the raw xml to make sure pcs did the right thing. >cibadmin -Ql | grep interleave > > would tell you. "pcs resource show" actually just shows the summ

Re: [Pacemaker] Offline Cluster edit

2013-10-15 Thread Lars Marowsky-Bree
On 2013-10-15T17:05:39, Robert Lindgren wrote: > Worked like a charm, except: > > crm(live)configure# simulate actions nograph > > which I guess is only is available in newer versions. If you have an older version, you can run "ptest" instead of "simulate". ("ptest" of course still works in ne

Re: [Pacemaker] Question about the resource to fence a node

2013-10-15 Thread Lars Marowsky-Bree
On 2013-10-15T18:24:46, Kazunori INOUE wrote: > Oct 15 15:17:16 vm2 stonith-ng[9160]: warning: log_operation: f1:9273 > [ Performing: stonith -t external/libvirt -T reset vm3 ] > Oct 15 15:17:46 vm2 stonith-ng[9160]: warning: log_operation: f1:9588 > [ Performing: stonith -t external/libvirt -T

Re: [Pacemaker] Offline Cluster edit

2013-10-15 Thread Lars Marowsky-Bree
On 2013-10-15T09:39:25, Robert Lindgren wrote: What I'd do is to backup, then wipe the cluster configuration (/var/lib/pacemaker/cib/*), restart with the empty configuration (which will also help with ids that have changed etc). And then: # crm configure crm(live)configure# load xml replace /pat

Re: [Pacemaker] node trying to run resource even in standby mode

2013-10-15 Thread Lars Marowsky-Bree
On 2013-10-10T14:41:49, Lev Sidorenko wrote: > When I create resource like: > # pcs resource create myres lsb:myres > it is created and can see straight away in crm_mon: > - > myres (lsb:myres):Started node3 (unmanaged) FAILED > Failed actions: > myres_stop

Re: [Pacemaker] Resource failover without writing to CIB

2013-10-08 Thread Lars Marowsky-Bree
On 2013-10-08T12:56:16, Sam Gardner wrote: > Is there any way to simply monitor the response of an arbitrary ocf monitor > call, and immediately fail the affected resource over? Yes. Set migration-threshold=1 for either the individual resource or globally. Regards, Lars -- Architect Stor

Re: [Pacemaker] Service restoration in clone resource group

2013-10-08 Thread Lars Marowsky-Bree
On 2013-10-08T09:29:14, Sean Lutner wrote: > The clone was created using the interleave=true option, yes. Ok, so pcs hides that (interesting to know). > Does this have an affect on what I'm trying to accomplish? Yes, if you hadn't set that, it might have been an explanation. My best guess rig

Re: [Pacemaker] Service restoration in clone resource group

2013-10-08 Thread Lars Marowsky-Bree
On 2013-10-07T11:33:28, Sean Lutner wrote: > Clone: EIP-AND-VARNISH-clone > Group: EIP-AND-VARNISH >Resource: Varnish (provider=redhat type=varnish.sh class=ocf) > Operations: monitor interval=30s >Resource: Varnishlog (provider=redhat type=varnishlog.sh class=ocf) > Operations

Re: [Pacemaker] How to do "crm resource cleanup" with the new pacemaker ?

2013-10-07 Thread Lars Marowsky-Bree
On 2013-10-07T12:22:40, Lev Sidorenko wrote: > On the good old pacemaker there was a crm shell which is substituted by > pcs now. > But I can't find how to cleanup a resource with pcs. pcs is not a substitute for crmsh, but an alternative. (Only RHEL/CentOS have dropped crmsh completely, I think

Re: [Pacemaker] stonith - using multiple fencing devices for one node to fence device with redundant power sources

2013-10-04 Thread Lars Marowsky-Bree
On 2013-10-03T23:50:15, Digimer wrote: > > digimer's hack works, but it makes my eyes bleed. ;-) > meanie! That's not because of what you diligently debugged and described, though, but because it's necessary. In my opinion, 90%+ of all setups that actually need to use more than one device per le

Re: [Pacemaker] stonith - using multiple fencing devices for one node to fence device with redundant power sources

2013-10-03 Thread Lars Marowsky-Bree
On 2013-10-03T12:22:27, David Vossel wrote: > > Is there some way to tell, node needs to be fenced using two fencing > > devices? Or I'll need to create my own fencing plugin allowing to > > use two fencing devices simultaneously? > Not simultaneously (not sure if that is actually a requirement),

Re: [Pacemaker] Problems when quorum lost for a short period of time

2013-10-02 Thread Lars Marowsky-Bree
On 2013-10-02T09:26:26, Lev Sidorenko wrote: > It is actually 2 nodes for main+stanby and another two nodes just for > provide quorum. Like Andrew wrote, a third node would be enough for that purpose. You might as well run an iSCSI target on that node (instead of the full cluster stack) and use

Re: [Pacemaker] Solving a resource allocation problem

2013-09-19 Thread Lars Marowsky-Bree
On 2013-09-19T12:12:31, Andreas Mock wrote: > For a solution where I like to push a certain resource > to the new node (this service interruption doesn't > hurt too much) while being sure that the other gets > started on the newly upcoming node I have to balance > the stickiness and negative cons

Re: [Pacemaker] Solving a resource allocation problem

2013-09-19 Thread Lars Marowsky-Bree
On 2013-09-19T10:20:07, Andreas Mock wrote: > Hi all, > > I need a hint how to solve a resource allocation problem > on a two node cluster (pmck 1.1.11). > > I have two resource blocks (some stacked resources colocation inf) > which shall run on seperate nodes. I did this with a small negativ >

Re: [Pacemaker] Howto test/simulate the reaction of the cluster to node up and down

2013-09-19 Thread Lars Marowsky-Bree
On 2013-09-17T13:37:54, Andreas Mock wrote: > I have the problem that after a node rejoins the cluster some > resources are move back to that node. > Now I want to see the calculated scores to see where I do > have to adjust the stickyness to get the behaviour I like. > > I'm not sure how to us

Re: [Pacemaker] monitor on disabled nodes

2013-09-19 Thread Lars Marowsky-Bree
On 2013-09-18T12:20:08, Radoslaw Garbacz wrote: > Sorry for not being specific. > > The agent is meant to run only on a specific node (the head), and by > constraints is disabled on all other nodes. > > 'pcs constraint' reports: > Location Constraints: > Resource: dbx_nfs_head > Enabled

Re: [Pacemaker] monitor on disabled nodes

2013-09-18 Thread Lars Marowsky-Bree
On 2013-09-18T11:13:46, Radoslaw Garbacz wrote: > Hi, > > I have a question regarding the "monitor" operation on disabled nodes. > > I noticed that this operation is called even, when an agent is disabled for > a node. Is it an indented behavior or is there something wrong with my > configurat

  1   2   3   4   5   6   >