[Pacemaker] Call for review of undocumented parameters in resource agent meta data

2015-02-11 Thread Lars Ellenberg
On Fri, Jan 30, 2015 at 09:52:49PM +0100, Dejan Muhamedagic wrote: > Hello, > > We've tagged today (Jan 30) a new stable resource-agents release > (3.9.6) in the upstream repository. > > Big thanks go to all contributors! Needless to say, without you > this release would not be possible. Big tha

[Pacemaker] Announcing the Heartbeat 3.0.6 Release

2015-02-10 Thread Lars Ellenberg
orosync still. But typically, for new deployments involving Pacemaker, in most cases you should chose Corosync 2.3.x as your membership and communication layer. For existing deployments using Heartbeat, upgrading to this Heartbeat version is strongly recommended. Thanks, Lars Ellenberg signat

Re: [Pacemaker] Two node cluster and no hardware device for stonith.

2015-02-09 Thread Lars Ellenberg
orruption, or data loss. I am personally willing to take the blame, and live with the consequences." Have some "boss" sign that ^^^ in the real world using a real pen. Lars -- : Lars Ellenberg : http://www.LINBIT.com | Your Way to High Availability

[Pacemaker] Patches: RFC before pull request

2014-12-09 Thread Lars Ellenberg
Andrew, All, Please have a look at the patches I queued up here: https://github.com/lge/pacemaker/commits/for-beekhof Most (not all) are specific for the heartbeat cluster stack. Thanks, Lars A few comments here: - This effectively changes crm_mon output, but also changes logging

Re: [Pacemaker] DRBD with Pacemaker on CentOs 6.5

2014-11-11 Thread Lars Ellenberg
ww/html? > > If so, what happens if you run 'ls -al /var/www/html' in a shell? > > > > Oct 28 18:13:39 node02 apache(WebSite)[30007]: INFO: apache not running > > Oct 28 18:13:39 node02 apache(WebSite)[30007]: INFO: waiting for apache > > /etc/httpd/conf/htt

Re: [Pacemaker] How to avoid CRM sending stop when ha.cf gets 2nd node configured

2014-11-10 Thread Lars Ellenberg
maintenance-mode", then you do your re-archetecturing of your cluster, and once you are satisfied with the new cluster, you take it out of maintenance mode again? At least that is one of the intended use cases for maintenance mode. -- : Lars Ellenberg : http://www.LINBIT.com | Your Way to H

Re: [Pacemaker] [ha-wg-technical] [Linux-HA] [ha-wg] [RFC] Organizing HA Summit 2015

2014-11-05 Thread Lars Ellenberg
On Sat, Nov 01, 2014 at 01:19:35AM -0400, Digimer wrote: > All the cool kids will be there. > > You want to be a cool kid, right? Well, no. ;-) But I'll still be there, and a few other Linbit'ers as well. Fabio, let us know what we could do to help make it happen. Lars > On 01/11/14 0

Re: [Pacemaker] can we update an attribute with cmpxchg "atomic compare and exchange" semantics?

2014-09-30 Thread Lars Ellenberg
On Tue, Sep 30, 2014 at 01:51:21PM +1000, Andrew Beekhof wrote: > > On 30 Sep 2014, at 6:22 am, Lars Ellenberg wrote: > > > On Wed, Sep 10, 2014 at 11:50:58AM +0200, Lars Ellenberg wrote: > >> > >> Hi Andrew (and others). > >> > >> For a

Re: [Pacemaker] can we update an attribute with cmpxchg "atomic compare and exchange" semantics?

2014-09-29 Thread Lars Ellenberg
On Wed, Sep 10, 2014 at 11:50:58AM +0200, Lars Ellenberg wrote: > > Hi Andrew (and others). > > For a certain use case (yes, I'm talking about DRBD "peer-fencing" on > loss of replication link), it would be nice to be able to say: > > update s

[Pacemaker] can we update an attribute with cmpxchg "atomic compare and exchange" semantics?

2014-09-10 Thread Lars Ellenberg
Hi Andrew (and others). For a certain use case (yes, I'm talking about DRBD "peer-fencing" on loss of replication link), it would be nice to be able to say: update some_attribute=some_attribute+1 where some_attribute >= 0 delete some_attribute where some_attribute=0 Ok, that's not the clas

Re: [Pacemaker] Configuration recommandations for (very?) large cluster

2014-08-13 Thread Lars Ellenberg
strcmp(op, CRM_OP_JOIN_ANNOUNCE) == 0) { - return I_NODE_JOIN; + if (our_dc_prio == INT_MIN) { + char * dc_prio_str = getenv("HA_dc_prio"); + + if (dc_prio_str == NULL) { + our_dc_prio = 1; + } else { +

Re: [Pacemaker] Multiple node loadbalancing

2014-07-30 Thread Lars Ellenberg
(if you intend to keep your sanity). > I would really appreciate any suggestions on this or even > links where I can find the information would be appreciated. Use pacemaker. Whether you want heartbeat or corosync as the communication an membership layer is up to you. For new instal

Re: [Pacemaker] [DRBD-user] DRBD active/passive on Pacemaker+CMAN cluster unexpectedly performs STONITH when promoting

2014-07-07 Thread Lars Ellenberg
quot;bad" (from the shooting nodes point of view) so they would just keep killing each other then. "Don't do that." But tell the cluster to not even attempt to promote, unless the local data is known to be UpToDate *and* the remote data is either known (DRBD is connecte

Re: [Pacemaker] [DRBD-user] DRBD active/passive on Pacemaker+CMAN cluster unexpectedly performs STONITH when promoting

2014-07-04 Thread Lars Ellenberg
ged=true > pcs -f dc_cfg resource create DCVM ocf:heartbeat:VirtualDomain \ > config=/etc/libvirt/qemu/dc.xml migration_transport=tcp > migration_network_suffix=-10g \ > hypervisor=qemu:///system meta allow-migrate=false target-role=Started > is-managed=true \ >

Re: [Pacemaker] drbd + lvm

2014-06-12 Thread Lars Ellenberg
ur lvm.conf: filter = [ "a|^/dev/your/system/PVs|", "a|^/dev/drbd|", "r|.|" ] > https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/LVM > > Keep your volume_list the way it is and use the 'exclusive=true' LVM >

Re: [Pacemaker] DRBD primary/primary + Pacemaker goes into split brain after crm node standby/online

2014-06-12 Thread Lars Ellenberg
m-unfence-peer.sh; > } disk { fencing resource-and-stonith; } > > net { > allow-two-primaries; > after-sb-0pri discard-zero-changes; > after-sb-1pri discard-secondary; > after-sb-2pri disconnect; > } > on testvm1 { &g

Re: [Pacemaker] Not unmoving colocated resources can provoke DRBD split-brain

2014-06-12 Thread Lars Ellenberg
emaker level, *and* use fencing resource-and-stonith + crm-fence-peer.sh on the DRBD level. You may want to use the "adjust-master-score" parameter of the DRBD resource agent as well, to avoid pacemaker attempting to promote an "only Consistent" DRBD, which will usually fail anyways.

Re: [Pacemaker] no-quorum-policy = demote?

2014-04-11 Thread Lars Ellenberg
that node attribute) in your promote action, refuse to promote if no quorum sleep 3*T (+ time to demote) only then actually promote. That way, you are "reasonably" sure that, before you actually promote, the former master had a chance to notice quorum loss and de

Re: [Pacemaker] Colocation constraint to External Managed Resource

2013-10-15 Thread Lars Ellenberg
esource stays at: > >> > >> Clone Set: CLONE-percona [mysql-percona] (unmanaged) > >> mysql-percona:0(lsb:mysql):Started NODE1(unmanaged) > >> Stopped: [ mysql-percona:1 ] > >> > >>But mysql is running: > >> > >>[root@NODE2~]#

Re: [Pacemaker] Corosync hanging during stop

2013-10-11 Thread Lars Ellenberg
4 > corosynclib-1.4.1-7.el6.x86_64 > corosync-1.4.1-7.el6.x86_64 > resource-agents-3.9.2-12.el6.x86_64 > cluster-glue-libs-1.0.5-6.el6.x86_64 > cluster-glue-1.0.5-6.el6.x86_64 > > Thanks for any hints ! > > Kind regards, > Detlef -- : Lars Ellenberg : LINBIT | Your W

Re: [Pacemaker] Colocation constraint to External Managed Resource

2013-10-11 Thread Lars Ellenberg
:0(lsb:mysql):Started NODE1 (unmanaged) > mysql-percona:1(lsb:mysql):Started NODE2 (unmanaged) > > Same thing happens when I reboot NODE2 (or other way around). > > --- > > I would expect that crm_mon ALWAYS reflects the local state, however >

Re: [Pacemaker] [Linux-HA] Probably a regression of the linbit drbd agent between pacemaker 1.1.8 and 1.1.10

2013-09-10 Thread Lars Ellenberg
ters like me? ;-) Why not? That's what release candidates are intended for. You'd only have to confirm that it works for you now. Respectively, that it still does not, in which case you better report that now than after the release, right? -- : Lars Ellenberg : LINBIT | Your Way to H

Re: [Pacemaker] [Linux-HA] Probably a regression of the linbit drbd agent between pacemaker 1.1.8 and 1.1.10

2013-09-09 Thread Lars Ellenberg
On Mon, Sep 09, 2013 at 02:42:45PM +1000, Andrew Beekhof wrote: > > On 06/09/2013, at 5:51 PM, Lars Ellenberg wrote: > > > On Tue, Aug 27, 2013 at 06:51:45AM +0200, Andreas Mock wrote: > >> Hi Andrew, > >> > >> as this is a real showstopper at the mom

Re: [Pacemaker] [Linux-HA] Probably a regression of the linbit drbd agent between pacemaker 1.1.8 and 1.1.10

2013-09-06 Thread Lars Ellenberg
03-test results in the > > following: > > > > --8<- > > Last updated: Mon Aug 26 19:29:38 2013 Last change: Mon Aug 26 > > 19:29:28 2013 via cibadmin on dis04-test > > Stack: cman > > Current DC: dis03-test

Re: [Pacemaker] ha_logd and logfile rotation

2013-08-07 Thread Lars Ellenberg
pposed to handle SIGHUP by re-opening the log files. If it does not do that for you, upgrade. If it still does not do that, complain again ;-) -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linb

Re: [Pacemaker] Multi-node resource dependency

2013-07-19 Thread Lars Ellenberg
nk attribute NODE2 set color slime attribute NODE4 set color slime crm configure colocation c-by-color inf: rsc_a rsc_b rsc_c node-attribute=color The "implicit default" node-attribute is #uname ... so using "color" the resources only need to run on node

Re: [Pacemaker] crond on both nodes (active/passive) but some jobs on active only

2013-07-05 Thread Lars Ellenberg
.disable > > > > i read anywhere crond ignores files with dot. > > > > but new experience: crond needs to restarted or signalled. > > > > how this is done best within pacemaker ? > > is clone for me ? > > > > > > thanks in advance > > andrea

Re: [Pacemaker] Monitor and standby

2013-07-03 Thread Lars Ellenberg
On Wed, Jul 03, 2013 at 12:56:35PM +0200, Denis Witt wrote: > On Wed, 3 Jul 2013 12:35:34 +0200 > Lars Ellenberg wrote: > > > Maybe you and pacemaker disagree about the meaning of "standby"? > > Hi Lars, > > obviously, yes. My understanding was that a

Re: [Pacemaker] Monitor and standby

2013-07-03 Thread Lars Ellenberg
instance. Note that a pacemaker node in "standby" is supposed to not run any resources, so if it notices that DRBD is "running" there (in Secondary), it will stop it, too. Maybe you and pacemaker disagree about the meaning of "standby"? --

Re: [Pacemaker] drbd on passive node not started

2013-07-03 Thread Lars Ellenberg
> maybe someone experienced can have a look into logs ? The logs you provide clearly show that pacemaker *did* start DRBD, and successfully. Wrong timeframe? Lars -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRB

Re: [Pacemaker] Fixed! - Re: Problem with dual-PDU fencing node with redundant PSUs

2013-06-29 Thread Lars Ellenberg
fficiently unlikely" to just conclude that it is in fact dead. Rather that than a fencing method that returns "yes, I rebooted that node" when in fact that node did not even notice... > Using two separate UPSes and two separate PDUs to feed either PSU

Re: [Pacemaker] owership of created symlink

2013-06-04 Thread Lars Ellenberg
ership and permissions of the directory matters. once mounted, do "chown / chmod" on "/mnt/mirror/var/mail/." Also make sure the uid/gid is the same on all nodes. > but old problem euid=5xx egid=8 (mail) can not create lock file > /var/mail/.lock > > please hel

Re: [Pacemaker] crm subshell 1.2.4 incompatible to pacemaker 1.1.9?

2013-05-15 Thread Lars Ellenberg
On Wed, May 15, 2013 at 03:34:14PM +0200, Dejan Muhamedagic wrote: > On Tue, May 14, 2013 at 10:03:59PM +0200, Lars Ellenberg wrote: > > On Tue, May 14, 2013 at 09:59:50PM +0200, Lars Ellenberg wrote: > > > On Mon, May 13, 2013 at 01:53:11PM +0200, Michael Schwartzkopff

Re: [Pacemaker] crm subshell 1.2.4 incompatible to pacemaker 1.1.9?

2013-05-14 Thread Lars Ellenberg
On Tue, May 14, 2013 at 09:59:50PM +0200, Lars Ellenberg wrote: > On Mon, May 13, 2013 at 01:53:11PM +0200, Michael Schwartzkopff wrote: > > Hi, > > > > crm tells me it is version 1.2.4 > > pacemaker tell me it is verison 1.1.9 > > > > So it should work

Re: [Pacemaker] crm subshell 1.2.4 incompatible to pacemaker 1.1.9?

2013-05-14 Thread Lars Ellenberg
single argument against allowed lifetime values (reboot, forever), and assume it is supposed to be a node name otherwise? Then the error would become ERROR: unknown node name: node1 Which is probably more useful most of the time. Dejan? -- : Lars Ellenberg : LINBIT | Your Way to High Availabilit

Re: [Pacemaker] Exchanging data between resource agent instances

2013-03-19 Thread Lars Ellenberg
dev size I use blockdev --getsize64 device_name > The problem is, when I'm using DRBD, that blockdev fails on slave device. Well, then use awk '/ drbd0$/ { print $3 * 1024 }' /proc/partitions No? -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and

Re: [Pacemaker] Exchanging data between resource agent instances

2013-03-18 Thread Lars Ellenberg
the idea of putting that size in the cib. -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/

Re: [Pacemaker] Block stonith when drbd inconsistent

2013-03-11 Thread Lars Ellenberg
be unmounted. This problem > > is causing 90% of fencing for me. So it is not DRBD failing to demote (go secondary), but Filesystem failing to stop (umount), that is causing your failures and fencing. Why do you have "blocked processes"? Maybe it is just a matter of chosing

Re: [Pacemaker] Pacemaker DRBD as Physical Volume on Encrypted RAID1

2013-03-06 Thread Lars Ellenberg
> meta-disk internal; > > I.e., what goes in the "disk /dev/?;"? Would it be "disk > /dev/md2_crypt;"? Yes. > And can we do our setup on an existing Encrypted RAID1 setup Yes. > (if we do pvcreate on drbd1, we get errors)? Huh? -- : Lars Ellenberg

Re: [Pacemaker] Trouble with ocf:Squid resource agent

2013-02-08 Thread Lars Ellenberg
On Fri, Feb 08, 2013 at 11:21:15AM +0100, Lars Ellenberg wrote: > On Mon, Aug 13, 2012 at 02:07:46PM +0200, Dejan Muhamedagic wrote: Appologies, I did not look at the date of the Post. For some reason it appeart as "first unread", and I assumed it was recent. D'oh. :-) >

Re: [Pacemaker] Trouble with ocf:Squid resource agent

2013-02-08 Thread Lars Ellenberg
0-9]+\.[0-9]+:'$SQUID_PORT' > > > > |tcp.*:::'$SQUID_PORT' )/{ This is supposed to be fixed as well in the current version of that script... > Yes. If somebody opens a bugzilla at LF > (https://developerbugs.linuxfoundation.org/) or an issue at > https://gi

Re: [Pacemaker] unable to load drbd module

2012-09-17 Thread Lars Ellenberg
is matching your kernel, or compile yourself against matching kernel headers. -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. __

Re: [Pacemaker] drbd under pacemaker - always get split brain

2012-07-12 Thread Lars Ellenberg
01:33 vmnci20 pengine: [3568]: notice: LogActions: Start > > drbd-sas0:1(vmnci21) > > > > > > And it even is promoted right away: > > Jul 10 06:01:36 vmnci20 pengine: [3568]: notice: LogActions: Promote > > drbd-sas0:1(Slave -> Master vmnci21) >

Re: [Pacemaker] drbd under pacemaker - always get split brain

2012-07-12 Thread Lars Ellenberg
produce data divergence. Not suprisingly, that is exactly what you get. Fix your Problem. See above; hint: fencing resource-and-stonith, crm-fence-peer.sh + stonith_admin, add stonith, maybe add a third node so you don't need to ignore quorum

Re: [Pacemaker] Two slave nodes, neither will promote to Master

2012-07-03 Thread Lars Ellenberg
- meta target-role="Master" master-max="1" master-node="1" clone-max="2" clone-node-max="1" notify="true" and that: - colocation services_on_master inf: Services ms_nfs:Master - order fs_before_services inf: ms_nfs:promote Serv

Re: [Pacemaker] Why Did Pacemaker Restart this VirtualDomain Resource?

2012-06-19 Thread Lars Ellenberg
Jun 14 15:43:32 vmhost1 attrd: [3855]: notice: attrd_trigger_update: Sending > flush op to all hosts for: p_ping (2000) > Jun 14 15:43:32 vmhost1 attrd: [3855]: notice: attrd_perform_update: Sent > update 165: p_ping=2000 And there it is back on 2000 again ... Lars -- : L

Re: [Pacemaker] Why Did Pacemaker Restart this VirtualDomain Resource?

2012-06-19 Thread Lars Ellenberg
On Tue, Jun 19, 2012 at 11:12:46AM -0500, Andrew Martin wrote: > Hi Emmanuel, > > > Thanks for the idea. I looked through the rest of the log and these > "return code 8" errors on the ocf:linbit:drbd resources are occurring > at other intervals (e.g. today) when the VirtualDomain resource is > u

Re: [Pacemaker] Changing name/location of resource script

2012-06-12 Thread Lars Ellenberg
to 120 resources, I would like find a way > to automate it a bit more, but have not been able to find an easy way > to make the change on the command line. crm configure edit, then :%s/// ... but wait ... crm configure help filter careful, that one is a bit tricky to get right. > > An

Re: [Pacemaker] Announce: pcs / pcs-gui (Pacemaker/Corosync Configuration System)

2012-06-08 Thread Lars Ellenberg
On Wed, Jun 06, 2012 at 07:22:47PM +0200, Rasto Levrinc wrote: > On Wed, Jun 6, 2012 at 4:45 PM, Lars Ellenberg > wrote: > > On Tue, Jun 05, 2012 at 05:15:04PM +0200, Rasto Levrinc wrote: > >> On Tue, Jun 5, 2012 at 1:27 PM, Lars Marowsky-Bree wrote: > >> > On 20

Re: [Pacemaker] Finer control over when email is sent?

2012-06-08 Thread Lars Ellenberg
ating to failures, and just to start and > > stop events on our IP addresses? > > > > I checked the documentation and man pages and didn't see anything > > immediate, but I wanted to make sure I hadn't overlooked any opt

Re: [Pacemaker] [RFC] [Patch] DC node preferences (dc-priority)

2012-06-08 Thread Lars Ellenberg
On Mon, Jun 04, 2012 at 11:33:45AM +1000, Andrew Beekhof wrote: > On Mon, Jun 4, 2012 at 11:28 AM, Andrew Beekhof wrote: > > On Fri, May 25, 2012 at 7:48 PM, Florian Haas wrote: > >> On Fri, May 25, 2012 at 11:38 AM, Lars Ellenberg > >> wrote: > >>> O

Re: [Pacemaker] Announce: pcs / pcs-gui (Pacemaker/Corosync Configuration System)

2012-06-06 Thread Lars Ellenberg
On Tue, Jun 05, 2012 at 05:15:04PM +0200, Rasto Levrinc wrote: > On Tue, Jun 5, 2012 at 1:27 PM, Lars Marowsky-Bree wrote: > > On 2012-06-05T09:43:09, Andrew Beekhof wrote: > > > >> Every argument made so far applies equally to HAWK and the Linbit GUI, > >> yet there was no outcry when they were

[Pacemaker] Adding a new node in standby.

2012-05-25 Thread Lars Ellenberg
add the node to the cib, and set it to standby, > > before it even joins for the first time. > > Haha, good one. > > Wait, you weren't joking? Nope. "Works for me". Not that I do that very often, but I did, and it worked. -- : Lars Ellenberg : LINBIT | Your Way to

Re: [Pacemaker] [RFC] [Patch] DC node preferences (dc-priority)

2012-05-25 Thread Lars Ellenberg
On Fri, May 25, 2012 at 09:05:54PM +1000, Andrew Beekhof wrote: > On Fri, May 25, 2012 at 7:48 PM, Florian Haas wrote: > > On Fri, May 25, 2012 at 11:38 AM, Lars Ellenberg > > wrote: > >> On Fri, May 25, 2012 at 11:15:32AM +0200, Florian Haas wrote: > >>> On

Re: [Pacemaker] [RFC] [Patch] DC node preferences (dc-priority)

2012-05-25 Thread Lars Ellenberg
On Fri, May 25, 2012 at 11:15:32AM +0200, Florian Haas wrote: > On Fri, May 25, 2012 at 10:45 AM, Lars Ellenberg > wrote: > > Sorry, sent to early. > > > > That would not catch the case of cluster partitions joining, > > only the pacemaker startup with fully c

Re: [Pacemaker] Debug message granularity

2012-05-25 Thread Lars Ellenberg
On Wed, May 23, 2012 at 08:37:44AM +1000, Andrew Beekhof wrote: > On Tue, May 22, 2012 at 9:51 PM, Ron Kerry wrote: > > On 5/22/12 3:33 AM, Andrew Beekhof wrote: > >>> > >>> and I see nothing in > >>> >  pacemaker itself that gives me any separate controls over its logging > >>> >  verbosity. > >>

Re: [Pacemaker] [RFC] [Patch] DC node preferences (dc-priority)

2012-05-25 Thread Lars Ellenberg
On Fri, May 25, 2012 at 10:29:58AM +0200, Lars Ellenberg wrote: > On Fri, May 25, 2012 at 10:50:25AM +1000, Andrew Beekhof wrote: > > On Fri, May 25, 2012 at 10:04 AM, Lars Ellenberg > > wrote: > > > On Sun, May 06, 2012 at 09:45:09PM +1000, Andrew Beekhof wrote: > &g

Re: [Pacemaker] [RFC] [Patch] DC node preferences (dc-priority)

2012-05-25 Thread Lars Ellenberg
On Fri, May 25, 2012 at 10:50:25AM +1000, Andrew Beekhof wrote: > On Fri, May 25, 2012 at 10:04 AM, Lars Ellenberg > wrote: > > On Sun, May 06, 2012 at 09:45:09PM +1000, Andrew Beekhof wrote: > >> On Thu, May 3, 2012 at 5:38 PM, Lars Ellenberg > >> wrote: > &

Re: [Pacemaker] [RFC] [Patch] DC node preferences (dc-priority)

2012-05-24 Thread Lars Ellenberg
On Sun, May 06, 2012 at 09:45:09PM +1000, Andrew Beekhof wrote: > On Thu, May 3, 2012 at 5:38 PM, Lars Ellenberg > wrote: > > > > People sometimes think they have a use case > > for influencing which node will be the DC. > > Agreed :-) > > > > >

Re: [Pacemaker] DRBD < LVM < EXT4 < NFS performance

2012-05-24 Thread Lars Ellenberg
IO backend? Did you post your drbd configuration setings already? > > > > After reenabling the secondary node the DRBD synchronization is quite slow. > > > > > >>> > >>> Has anyone an idea what could cause such problems? I have no idea for > >

[Pacemaker] [RFC] [Patch] DC node preferences (dc-priority)

2012-05-03 Thread Lars Ellenberg
,pcmk_env.syslog, 1); setenv("HA_use_logd", pcmk_env.use_logd, 1); + setenv("HA_dc_prio",pcmk_env.dc_prio, 1); if(pcmk_env.logfile) { setenv("HA_debugfile", pcmk_env.logfile, 1); } --- ./lib/ais/uti

Re: [Pacemaker] ERROR: te_graph_trigger: Transition failed: terminated pacemaker's problem or mine?

2012-04-30 Thread Lars Ellenberg
On Mon, Apr 30, 2012 at 01:00:11PM +1000, Andrew Beekhof wrote: > On Sat, Apr 28, 2012 at 5:40 AM, Lars Ellenberg > wrote: > > On Fri, Apr 27, 2012 at 11:31:23AM +0100, Tim Small wrote: > >> Hi, > >> > >> I'm trying to get to the bottom of a problem I

Re: [Pacemaker] ERROR: te_graph_trigger: Transition failed: terminated pacemaker's problem or mine?

2012-04-27 Thread Lars Ellenberg
so-VE > colocation calypso-VE-with-calypso-FS inf: calypso-VE calypso-FS > colocation epione-FS-on-essex02-LVM inf: epione-FS essex02-LVM > colocation epione-FS-with-essex02-LVM inf: epione-FS essex02-LVM > colocation epione-SendArp-with-epione-VE inf: epione-SendArp epione-VE > colo

Re: [Pacemaker] Sporadic problems of rejoin after split brain situation

2012-03-20 Thread Lars Ellenberg
:31 oan1 crmd: [7601]: info: ccm_event_detail: NEW MEMBERSHIP: > trans=15, nodes=1, new=0, lost=0 n_idx=0, new_idx=1, old_idx=3 > Mar 16 01:35:31 oan1 crmd: [7601]: info: ccm_event_detail:CURRENT: oan1 > [nodeid=0, born=15] > Mar 16 01:35:31 oan1 cib: [7597]: info: cib_process_request: Oper

Re: [Pacemaker] Surprisingly fast start of resources on cluster failover.

2012-03-07 Thread Lars Ellenberg
comes pacemaker starts a resources twice as fast > than I do from CLI ? Other than above suggestion, did you verify that it ends up doing the same thing when started from pacemaker, compared to when started by you from commandline? Did you compare the results? -- : Lars Ellenbe

Re: [Pacemaker] Can Master/Slave resource transit from Master to Stopped directly ?

2012-03-07 Thread Lars Ellenberg
ke advantage So, a "start" could then return $OCF_RUNNING_MASTER to indicate that it went straight into Master mode, and a "demote" would be able to indicate it went straight into Stopped state by returning $OCF_NOT_RUNNING. No idea

Re: [Pacemaker] Failing to move around IPaddr2 resource

2012-02-24 Thread Lars Ellenberg
My configuration is here, in case there's anything wrong with it. > > Looks like you forgot to attach it. > > > > > Anlu > > > > > > ___ > > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > > http://oss.clu

Re: [Pacemaker] How to run heartbeat and pacemaker resources as a non-root user

2012-02-24 Thread Lars Ellenberg
which I need to run my different custom > > applications  (configured using crm)  as a non root user. > > Can this be done? > > "su - otheruser" in the resource agent > have a look in the existing agents for how they do it Maybe we should add a "user" option

Re: [Pacemaker] Proper way to migrate multistate resource?

2012-02-07 Thread Lars Ellenberg
this one: > > > > location you-name-it resource-id \ > > rule $role=Master -inf: \ > > #uname ne node-where-it-should-be-master > > These constraints would prevent the MS resource to run in Master state even > on > that node.

Re: [Pacemaker] Proper way to migrate multistate resource?

2012-02-07 Thread Lars Ellenberg
rule $role=Master -inf: \ #uname ne node-where-it-should-be-master Cheers, Lars --- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com > The only method I've found to safely and reliable migrate a mul

Re: [Pacemaker] Where is MAXMSG defined?

2012-02-07 Thread Lars Ellenberg
If you are asking about what I think you do, then that would be in glue, include/clplumbing/ipc.h But be careful, when fiddling with it. What are you trying to solve, btw? -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com __

Re: [Pacemaker] How to start resources in a Resource Group in parallel

2012-02-02 Thread Lars Ellenberg
On Thu, Feb 02, 2012 at 08:28:16PM +1100, Andrew Beekhof wrote: > On Tue, Jan 31, 2012 at 9:52 PM, Dejan Muhamedagic > wrote: > > Hi, > > > > On Tue, Jan 31, 2012 at 10:29:14AM +, Kashif Jawed Siddiqui wrote: > >> Hi Andrew, > >> > >>           It is the LRMD_MAX_CHILDREN limit which by defau

Re: [Pacemaker] don't want to restart clone resource

2012-02-01 Thread Lars Ellenberg
t; clone-max. Did you file a bugzilla? Has that made progress? -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.

Re: [Pacemaker] How to flush the arp cache of a router?

2012-01-31 Thread Lars Ellenberg
have any "IPaddr2.*: ERROR: Could not send gratuitous arps" in your logs? Maybe replacing the call to send_arp with calls to arping will do, as I described in this thread: http://www.gossamer-threads.com/lists/linuxha/pacemaker/58444 -- : Lars

Re: [Pacemaker] How to start resources in a Resource Group in parallel

2012-01-31 Thread Lars Ellenberg
1f00ab4fab But since you are using SLES, why not complain there, and have them add it for you? -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com ___ Pacemaker mailing list: Pacemaker@oss

Re: [Pacemaker] How to deal with unix signals in a glib mainloop (was: [Problem] The attrd does not sometimes stop.)

2012-01-19 Thread Lars Ellenberg
On Fri, Jan 20, 2012 at 09:21:58AM +1100, Angus Salkeld wrote: > On 19/01/12 22:23 +0100, Lars Ellenberg wrote: > >On Tue, Jan 17, 2012 at 12:13:37AM +0100, Lars Ellenberg wrote: > >>On Tue, Jan 17, 2012 at 09:52:35AM +1100, Andrew Beekhof wrote: > >>> >

Re: [Pacemaker] How to deal with unix signals in a glib mainloop (was: [Problem] The attrd does not sometimes stop.)

2012-01-19 Thread Lars Ellenberg
On Tue, Jan 17, 2012 at 12:13:37AM +0100, Lars Ellenberg wrote: > On Tue, Jan 17, 2012 at 09:52:35AM +1100, Andrew Beekhof wrote: > > > > Ok, done: > > > > https://github.com/beekhof/pacemaker/commit/2a6b296 > > > > If I'm adding voodoo, I at least w

Re: [Pacemaker] [Problem] The attrd does not sometimes stop.

2012-01-16 Thread Lars Ellenberg
ed so it > can be removed again if the reason goes away. That about sums it up, then ;-) -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.

Re: [Pacemaker] [Problem] The attrd does not sometimes stop.

2012-01-16 Thread Lars Ellenberg
On Mon, Jan 16, 2012 at 11:42:32PM +1100, Andrew Beekhof wrote: > >>> http://developer.gnome.org/glib/2.30/glib-The-Main-Event-Loop.html#GSourceFuncs > >>> > >>> iiuc, mainloop does something similar to (oversimplified): > >>>        timeout = -1; /* infinity */ > >>>        for s in all GSource >

Re: [Pacemaker] [Problem] The attrd does not sometimes stop.

2012-01-16 Thread Lars Ellenberg
ry and mainloop already doing the poll stage could potentially be solved by using cl_signal_set_interrupt(SIGTERM, 1), which would mean we can condense the prepare to if (trig->trigger) *timeout = 0; return trig->trigger; Glue (and heartbeat) code base is not tha

Re: [Pacemaker] [Problem] The attrd does not sometimes stop.

2012-01-14 Thread Lars Ellenberg
iseconds. */ +else + *timeout = 5000; /* arbitrary */ return trig->trigger; } This scenario does not let the blocked IPC off the hook, though. That is still possible, both for blocking send and blocking receive, so that should probably be fixed as well, somehow. I'm

Re: [Pacemaker] [Question] About the rotation of the pe-file.

2012-01-14 Thread Lars Ellenberg
return; if (sequence > max) sequence = 0; -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.o

Re: [Pacemaker] [Problem] The attrd does not sometimes stop.

2011-12-29 Thread Lars Ellenberg
I understood it. > > I try the operation of the patch in our environment. > > > > To Alan: Will you try a patch? > > > > Best Regards, > > Hideo Yamauchi. > > > > --- On Tue, 2011/11/15, Dejan Muhamedagic wrote: > > > > > Hi, > > &g

Re: [Pacemaker] Remote CRM shell from LCMC

2011-12-28 Thread Lars Ellenberg
On Wed, Dec 28, 2011 at 12:57:33AM +0100, Rasto Levrinc wrote: > Hi, > > this being a slow news day, There is this great new feature in LCMC, but > probably completely useless. :) The LCMC used to show for testing purposes > the CRM shell configuration, but people started to use it, so I left it >

Re: [Pacemaker] don't want to restart clone resource

2011-12-16 Thread Lars Ellenberg
, then > > > > the clone resource(node-app-rsc:2) running on the node-2 will restart and > > change to "node-app-rsc:1". > > > > You know, the node-app-rsc is my application, and I don't want it to > > restart. > > > > How could I do, Please? > >

Re: [Pacemaker] [Linux-HA] Antw: Re: Q: unmanaged MD-RAID & auto-recovery

2011-11-25 Thread Lars Ellenberg
On Fri, Nov 25, 2011 at 01:54:33PM +0100, Florian Haas wrote: > On 11/25/11 13:29, Lars Ellenberg wrote: > >> From the log snippet it's > >> not entirely clear whether that's a recurring monitor (interval == > >> whatever you configured, or 20 if default),

Re: [Pacemaker] Syntax highlighting in vim for crm configure edit

2011-11-22 Thread Lars Ellenberg
answer questions ;-) Not perfect, either. Probably "detects" much more errors than necessary, and does not detect some that would be nice to have detected. (brace errors, quotation errors ...) But if there should be some vim syntax wizard out there, maybe our two attempts on doing it can

Re: [Pacemaker] IPv6addr failure loopback interface

2011-11-17 Thread Lars Ellenberg
ocf_log warn "Stop success." > return $OCF_SUCCESS > else > ocf_log err "Failed to stop." > return $OCF_ERR_GENERIC > fi > else > # was not running, so stop can

Re: [Pacemaker] [Problem] The attrd does not sometimes stop.

2011-11-14 Thread Lars Ellenberg
On Mon, Nov 14, 2011 at 11:58:09AM +1100, Andrew Beekhof wrote: > On Mon, Nov 7, 2011 at 8:39 AM, Lars Ellenberg > wrote: > > On Thu, Nov 03, 2011 at 01:49:46AM +1100, Andrew Beekhof wrote: > >> On Tue, Oct 18, 2011 at 12:19 PM,   wrote: > >> > Hi, > >&g

Re: [Pacemaker] [Drbd-dev] crm_attribute --quiet (was Fwd: [Linux-HA] Should This Worry Me?)

2011-11-14 Thread Lars Ellenberg
gt; > > Should ocf:linbit:drbd be using "-q"? > > Correct. Sorry about that. -Q is still accepted, though. As it is accepted for a larger range of crm_attribute versions, I'll keep it for now. -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/

Re: [Pacemaker] Newcomer's question - API?

2011-11-07 Thread Lars Ellenberg
n case I accidentally copied some of it in > recreating it. You know, there are effectively no more than two entities you need to talk to, if you wanted the LCMC under some non-GPL licence. Which is Rasto, and LINBIT. Just a thought... -- : Lars Ellenberg : LINBIT | Your Way to High Availa

Re: [Pacemaker] [Problem] The attrd does not sometimes stop.

2011-11-06 Thread Lars Ellenberg
_IPC_DELAY from crm.h) to be actually processed, as the signal handler only raises a flag for the next mainloop iteration. If a (non-fatal) signal is delivered every few seconds, then the goto loop will never timeout. Please someone check this for plausibility ;-) -- : Lars Ellenber

Re: [Pacemaker] location setting with parenthesis

2011-11-03 Thread Lars Ellenberg
On Thu, Nov 03, 2011 at 09:30:45PM +0100, Andreas Kurz wrote: > On 11/03/2011 12:38 PM, Lars Ellenberg wrote: > > On Thu, Nov 03, 2011 at 07:23:01PM +0900, 池田 淳子 wrote: > >> Hi, > >> > >>>> location rsc_location-1 msDRBD \ > >>>> rule

Re: [Pacemaker] location setting with parenthesis

2011-11-03 Thread Lars Ellenberg
ySQL:0 and master-prmMySQL:0 gt 0 \ > rule role=master -inf: defined master-prmMySQL:1 and master-prmMySQL:1 gt 0 I may be missing something obvious, but why not a colocation constraint between msDRBD and prmMySQL? something like colocation asdf -inf: msDRBD:Master prmMySQL:Master -- :

Re: [Pacemaker] Postgres RA won't start

2011-10-13 Thread Lars Ellenberg
On Thu, Oct 13, 2011 at 06:35:27AM -0600, Serge Dubrouski wrote: > On Thu, Oct 13, 2011 at 4:29 AM, Lars Ellenberg > wrote: > > > On Wed, Oct 12, 2011 at 07:41:20PM -0600, Serge Dubrouski wrote: > > > On Wed, Oct 12, 2011 at 9:20 AM, Amar Prasovic > > wrote: > &

Re: [Pacemaker] Postgres RA won't start

2011-10-13 Thread Lars Ellenberg
monitor: unexpected operator > > > > This error is actually reported with any operator. I tried to start the > > script from CLI, I got the same thing with ./pgsql start, ./pgsql status, > > ./pgsql stop > > > > Weird. I don't know what to tell. The RA is

Re: [Pacemaker] crm_master triggering assert section != NULL

2011-10-13 Thread Lars Ellenberg
On Wed, Oct 12, 2011 at 08:08:21PM -0400, Yves Trudeau wrote: > What about referring to the git repository here: > > http://www.clusterlabs.org/wiki/Get_Pacemaker#Building_from_Source http://www.clusterlabs.org/mwiki/index.php?title=Install&diff=1287&oldid=1282 Lars

Re: [Pacemaker] crm_master triggering assert section != NULL

2011-10-12 Thread Lars Ellenberg
On Thu, Oct 13, 2011 at 01:21:46AM +0200, Lars Ellenberg wrote: > On Wed, Oct 12, 2011 at 05:09:45PM -0400, Yves Trudeau wrote: > > Hi Florian, > > > > On 11-10-12 04:09 PM, Florian Haas wrote: > > >On 2011-10-12 21:46, Yves Trudeau wrote: > > >>H

Re: [Pacemaker] crm_master triggering assert section != NULL

2011-10-12 Thread Lars Ellenberg
ting one" (namely, manage only replication, > >not the underlying daemon), then I guess I can't argue with that, but > >I'd still believe that would be a suboptimal approach. > Ohh... don't get me wrong, I am not the kind of guy that takes > pride in having re-inve

Re: [Pacemaker] nginx OCF script - strange syslog output

2011-10-12 Thread Lars Ellenberg
aker > > > ___ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch

  1   2   3   >