Re: [Pacemaker] resources show as running on all nodes right after adding them

2012-03-28 Thread Bernd Schubert
On 03/28/2012 04:39 PM, Florian Haas wrote: [...] Clearly this resource is not running on all nodes, so why is it being reported as such? Probably because your resource agent reports OCF_SUCCESS on a probe operation when it ought to be returning OCF_NOT_RUNNING. Pastebin the source of ocf:hydra

Re: [Pacemaker] resource stickiness and preventing stonith on failback

2011-08-24 Thread Bernd Schubert
Hello Brian, On 08/23/2011 10:56 PM, Brian J. Murrell wrote: Hi All, I am trying to configure pacemaker (1.0.10) to make a single filesystem highly available by two nodes (please don't be distracted by the dangers of multiply mounted filesystems and clustering filesystems, etc., as I am absolut

Re: [Pacemaker] OCF_RESKEY_device to the device to be managed

2010-11-09 Thread Bernd Schubert
> fs_01_stop_0 (call=74, rc=2, cib-update=90, confirmed=true) invalid > > parameter > > > > Suddenly, it failed to stop! Yeah, known bug. Frequently comes up with 1.0.9, rarely with 1.0.7 (no idea about 1.0.8) and supposed to be fixed in 1.0.10. Basically, it mad

Re: [Pacemaker] detect/cleanup failed resource

2010-10-21 Thread Bernd Schubert
On Thursday, October 21, 2010, Rasto Levrinc wrote: > On Thu, October 21, 2010 12:42 pm, Bernd Schubert wrote: > > Hi all, > > > > > > is there a better way to detect a failed resource than to run "crm_mon -1 > > -r"? > > Well, you could

[Pacemaker] detect/cleanup failed resource

2010-10-21 Thread Bernd Schubert
also no fun and also is not really fast. So I'm looking for *any* sane way to clean up resources or at least for a good parse-able way to get failed resources and the corresponding node. Thanks, Bernd -- Bernd Schubert DataDirect Networks

Re: [Pacemaker] Question: How many nodes can join a cluster?

2010-10-18 Thread Bernd Schubert
S resources, but still everything is in global pacemaker setup. We also have syslog-ng rules and a patched logd (patches sent to this list, need to update them again) to filter out all pacemaker debug logs, so that we can easily see messages from the lustre RA in syslogs. Cheers, Bernd -- Be

[Pacemaker] hgrc of the online repository

2010-10-06 Thread Bernd Schubert
Hello Andrew, any chance you could add a few lines to the .hg/hgrc of the online repository? Or to /etc/mercurial/hgrc or /etc/mercurial/hgrc.d? Reading patches is more easy if function names are provided... [diff] git = True nodates = True showfunc = True Thanks, Bernd -- Bernd Schubert

[Pacemaker] [PATCH 9/8 ] errata, only close file descriptors if that had been opened

2010-09-16 Thread Bernd Schubert
cl_log: Only close file descriptors if that had been opened This patch also could be merged with the 6th patch in the series (restore old open/write/close semantics). It fixes a valgrind warning about invalid close(). Signed-off-by: Bernd Schubert diff --git a/lib/clplumbing/cl_log.c b/lib

[Pacemaker] [PATCH 8/8 ] ha_logd: Add a SIGHUP signal handler to close/open log files

2010-09-15 Thread Bernd Schubert
ha_logd: Add a SIGHUP signal handler to close/open log files Without the signal handler cl_log uses inefficient IO, as it has to open/seek/flush/close the log files in order to allow cron log file rotation. Signed-off-by: Bernd Schubert diff --git a/include/clplumbing/cl_log.h b/include

Re: [Pacemaker] [PATCH 6/8 ] Restore old logfile open/seek/write/close behaviour.

2010-09-15 Thread Bernd Schubert
also uses system IO (open/close/write) instead of libc IO (fopen/fclose/fwrite). Libc IO has a buffer, which is not suitable for log files (in case of a stonith, all the buffer and which might large, will be missing in log files. Signed-off-by: Bernd Schubert diff --git a/lib/clplumbing/cl_log.c

Re: [Pacemaker] [PATCH 7/8 ] cl_log: Clean up white space

2010-09-15 Thread Bernd Schubert
cl_log: Clean up white space Signed-off-by: Bernd Schubert diff --git a/lib/clplumbing/cl_log.c b/lib/clplumbing/cl_log.c --- a/lib/clplumbing/cl_log.c +++ b/lib/clplumbing/cl_log.c @@ -161,8 +161,8 @@ cl_log_get_logdtime(void) void cl_log_set_logdtime(int logdtime

Re: [Pacemaker] [PATCH 5/8 ] ha_logd: New option to disable syslog logging

2010-09-15 Thread Bernd Schubert
ha_logd: New option to disable syslog logging As we already write ha-log and ha-debug, users might want to disable syslog logging. Signed-off-by: Bernd Schubert diff --git a/logd/ha_logd.c b/logd/ha_logd.c --- a/logd/ha_logd.c +++ b/logd/ha_logd.c @@ -91,6 +91,7 @@ static struct { int

[Pacemaker] [PATCH 4/8 ] cl_log: Always print the common log entity to syslog messages

2010-09-15 Thread Bernd Schubert
simple filter rules Signed-off-by: Bernd Schubert diff --git a/lib/clplumbing/cl_log.c b/lib/clplumbing/cl_log.c --- a/lib/clplumbing/cl_log.c +++ b/lib/clplumbing/cl_log.c @@ -543,7 +543,7 @@ cl_direct_log(int priority, const char* int needprivs = !cl_have_full_privs(); if

Re: [Pacemaker] [PATCH 3/8 ] ha_logd: Use C99 initializers, also correct max entity string length

2010-09-15 Thread Bernd Schubert
ha_logd: Use C99 initializers, also correct max entity string length C99 initializers are more easy to read. Signed-off-by: Bernd Schubert diff --git a/logd/ha_logd.c b/logd/ha_logd.c --- a/logd/ha_logd.c +++ b/logd/ha_logd.c @@ -87,18 +87,18 @@ static gboolean needs_shutdown = FALSE; static

[Pacemaker] [PATCH 2/8] cl_log: Simplify a function

2010-09-15 Thread Bernd Schubert
cl_log: Simplify a function Signed-off-by: Bernd Schubert diff --git a/lib/clplumbing/cl_log.c b/lib/clplumbing/cl_log.c --- a/lib/clplumbing/cl_log.c +++ b/lib/clplumbing/cl_log.c @@ -545,7 +545,7 @@ cl_direct_log(int priority, const char* entity =cl_log_entity

[Pacemaker] [PATCH 1/8 ] cl_log: Make functions static and remove CircularBuffer

2010-09-15 Thread Bernd Schubert
cl_log: Make functions static and remove CircularBuffer CircularBuffer was added more than 5 years ago and still it is not used. So remove dead code, it can be retrieved from the repository history if required. Also make functions static only used with cl_log.c Signed-off-by: Bernd Schubert

[Pacemaker] [PATCH 0/8 ] ha_logd and cl_log improvements

2010-09-15 Thread Bernd Schubert
Hi all, the following patches are to better handle bug 2470 and have some generic improvements. I'm not sure if I shall attach it to the bugzilla or if the mailing list is preferred. Thanks, Bernd -- Bernd Schubert DataDirect Net

[Pacemaker] bugzilla #2480 - group-node-node crm_mon prints

2010-09-06 Thread Bernd Schubert
up-by-node. Actually we use a wrapper that calls "crm_mon -1 -r -n" to give us the cluster status. Besides the so far missing "unmanaged" flag, "FAILED" is also an important missing information. Thanks, Bernd

Re: [Pacemaker] pingd

2010-09-03 Thread Bernd Schubert
(black box), that provides for example NFS to clients. You would want to have each and every additional service mirrored again. And you could not rely on additional customer NFS clients. > > May be easier, safer, and more transparent than > no-quorum=ignore plus some ping attribute bas

Re: [Pacemaker] pingd

2010-09-02 Thread Bernd Schubert
On Thursday, September 02, 2010, Lars Ellenberg wrote: > On Thu, Sep 02, 2010 at 11:00:12AM +0200, Bernd Schubert wrote: > > On Thursday, September 02, 2010, Andrew Beekhof wrote: > > > On Wed, Sep 1, 2010 at 11:59 AM, Bernd Schubert > > > > > > > My proposa

Re: [Pacemaker] pingd

2010-09-02 Thread Bernd Schubert
On Thursday, September 02, 2010, Andrew Beekhof wrote: > On Wed, Sep 1, 2010 at 11:59 AM, Bernd Schubert > > My proposal is to rip out all network code out of pingd and to add > > slightly modified files from 'iputils'. > > Close, but thats not portable. > In

[Pacemaker] [PATCH] ping RA: The host list must be provided

2010-09-01 Thread Bernd Schubert
ping RA: The host list must be provided While pingd allows to connect to heartbeat to get all peer nodes, the ping script RA cannot do that. Accordingly the hostlist is a required argument. Signed-off-by: Bernd Schubert diff --git a/extra/resources/ping b/extra/resources/ping --- a/extra

[Pacemaker] pingd

2010-09-01 Thread Bernd Schubert
on, as pingd.c includes a function from iputils ping. While the function is marked accordingly, it still does not include the original license statement, which is IMHO a clear license violation. I could probably do that quicky, but don't want to do something that is not accepted upst

Re: [Pacemaker] resource locations for cloned resources (asymmetric cluster)

2010-08-27 Thread Bernd Schubert
This is virtual machine test cluster and I recent renamed all host names. But used the old host names for the location :( I think we should add a warning message to crm shell if location host name is used, which is not defined in the cluster. Sorry again and thanks f

Re: [Pacemaker] resource locations for cloned resources (asymmetric cluster)

2010-08-27 Thread Bernd Schubert
Sorry for the double post and while I'm reading my own mail, I found it, I used the wrong host names in the location constraints :( That also explains why it worked on another cluster. Sorry for the noise, Bernd On Thursday, August 26, 2010, Bernd Schubert wrote: > Hi all, > >

[Pacemaker] resource locations for cloned resources (asymmetric cluster)

2010-08-26 Thread Bernd Schubert
reciated. Thanks, Bernd -- Bernd Schubert DataDirect Networks ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http:

[Pacemaker] resource locations for cloned resources (asymmetric cluster)

2010-08-26 Thread Bernd Schubert
Hi all, I'm trying to start a pingd clone resource on an asymmetric cluster. I specified locations, but it still refuses to start pingd === [r...@vrhel5-mds1 ha.d]# cat pingd.cib primitive pingdnet1 ocf:pacemaker:pingd \ params h

Re: [Pacemaker] Occasional error running ocf scripts

2010-08-13 Thread Bernd Schubert
then run into random issues all the time...). Cheers, Bernd -- Bernd Schubert DataDirect Networks ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http:/

[Pacemaker] overwrite quorum decision

2010-07-29 Thread Bernd Schubert
Hello all, is there a way to overwrite the quorum policy decision, lets say to "no quorum with n/2 - 1 nodes" or "no quorum if no access to any other node"? Thanks, Bernd ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterl

[Pacemaker] 1.0.9 forgets to pass arguments to the agent

2010-07-11 Thread Bernd Schubert
e: Sending flush op to all hosts for: last-failure-ost_demofs_0 (1278886216) I guess I need to fill a bugzilla, but I won't have time before Wednesday. Thanks, Bernd -- Bernd Schubert DataDirect Networks ___ Pacemaker mailing

[Pacemaker] 1.0.9 forgets to pass arguments to the agent

2010-07-11 Thread Bernd Schubert
How can it happen that parameters are missing in 1.0.9? The following condition is *sometimes* triggered (in our lustre_server agent, which is is a modified Filesystem agent) # It is possible that OCF_RESKEY_directory has one or even multiple trailing "/". # But the output of `mount` and /proc/

[Pacemaker] crm resource cleanup ignored

2010-07-02 Thread Bernd Schubert
Hello all, after the update 1.0.9 on our test cluster, new weird stonith issues come up. 1) It fails to start stonith resources on *some* nodes === Jul 02 14:43:23 phys-oss3 pengine: [18077]: WARN: unpack_rsc_op: Processing failed op st-rilo

Re: [Pacemaker] starting resources: Interrupted system call

2010-07-01 Thread Bernd Schubert
Never mind, seems to be fixed in 1.0.9 Thanks, Bernd On Thursday, July 01, 2010, Bernd Schubert wrote: > Hi all, > > there seems to be a new regression in pacemaker-1.0.8 (or cluster-glue > or whatever, really difficult to differentiate the layers). > > ul 01 15:04:37 phys-

[Pacemaker] starting resources: Interrupted system call

2010-07-01 Thread Bernd Schubert
tarted" is-managed="true" Shall I open a bug entry and attach hb_report or is it a know issue? Thanks, Bernd -- Bernd Schubert DataDirect Networks ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clus

Re: [Pacemaker] abrupt power failure problem

2010-06-15 Thread Bernd Schubert
On Tuesday 15 June 2010, Dejan Muhamedagic wrote: > Hi, > > On Tue, Jun 15, 2010 at 02:25:51PM -0600, Dan Urist wrote: > > On Tue, 15 Jun 2010 22:08:37 +0200 > > > > Dejan Muhamedagic wrote: > > > Hi, > > > > > > On Tue, Jun 15, 2010 at 01:15:08PM -0600, Dan Urist wrote: > > > > I've recently had

Re: [Pacemaker] abrupt power failure problem

2010-06-15 Thread Bernd Schubert
ONFIDENTIAL AND/OR OTHERWISE PROPRIETARY > MATERIAL and is thus for use only by the intended recipient. If you > received this in error, please contact the sender and delete the e-mail > and its attachments from all computers. > > > -Original Message-

Re: [Pacemaker] abrupt power failure problem

2010-06-15 Thread Bernd Schubert
On Tuesday 15 June 2010, Schaefer, Diane E wrote: > Hi, > We are having trouble with our two node cluster after one node > experiences an abrupt power failure. The resources do not seem to start > on the remaining node (ie DRBD resources do not promote to master). In > the log we notice: >

Re: [Pacemaker] Lustre and Multiple Mount Protection

2009-12-31 Thread Bernd Schubert
Hello Dejan, On Wednesday 30 December 2009, Dejan Muhamedagic wrote: > Hi, > > On Wed, Dec 30, 2009 at 01:31:27PM +0100, Bernd Schubert wrote: > > Hello Dejan, > > > > On Thursday 24 December 2009, Dejan Muhamedagic wrote: > > > > No, without Multiple

Re: [Pacemaker] Lustre and Multiple Mount Protection

2009-12-30 Thread Bernd Schubert
; file a bugzilla if the RA does something unexpected. The Filesystem agent behaves correctly, just Lustre must not claim the device is umounted although it is not. One of these bugs will be fixed in the next Lustre release and another one I still need to analyze. That is why one should us

Re: [Pacemaker] Lustre and Multiple Mount Protection

2009-12-23 Thread Bernd Schubert
are those annoying bugs that tell you the device is umounted although it is not. My lustre server agent, which I will submit here once I find some time to review it again, will protect you from this. I least I hope I did catch all Lustre bugs... And then pacemaker does not protect you to moun

Re: [Pacemaker] resource stickyness

2009-11-12 Thread Bernd Schubert
On Thursday 12 November 2009, Andrew Beekhof wrote: > On Thu, Nov 12, 2009 at 11:54 AM, Bernd Schubert > > wrote: > > Hello, > > > > I try to prevent auto-migration back from mds2 to mds1, but somehow > > resource- stickiness doesn't seem to work. After a fa

[Pacemaker] resource stickyness

2009-11-12 Thread Bernd Schubert
s3 location location-MDT_HC3WORK.oss4 MDT_HC3WORK -inf: oss4 MDT-HC3WORK is also part of a resource group, but the resource

Re: [Pacemaker] how to configure IPMI resource

2009-11-11 Thread Bernd Schubert
\ userid=root passwd=password interface=lanplus \ min_off_time=60 off_time=60 on_time=120 \ op monitor interval=600 timeout=240 It will reset "server1" using the IPMI-IP "ipmi-ip_of_server_1". -- Bernd Schubert DataDirect Networks ___

Re: [Pacemaker] server lockup failures

2009-10-30 Thread Bernd Schubert
On Friday 30 October 2009, Lars Marowsky-Bree wrote: > On 2009-10-29T09:58:13, Andrew Beekhof wrote: > > > Heartbeat based, I still didn't have the time to look into openais. > > > > I guess heartbeat wasn't hung then... otherwise it would have stopped > > sending "i'm here" packets (and dropped o

Re: [Pacemaker] server lockup failures

2009-10-30 Thread Bernd Schubert
On Friday 30 October 2009, Lars Marowsky-Bree wrote: > On 2009-10-29T09:58:13, Andrew Beekhof wrote: > > > Heartbeat based, I still didn't have the time to look into openais. > > > > I guess heartbeat wasn't hung then... otherwise it would have stopped > > sending "i'm here" packets (and dropped o

Re: [Pacemaker] server lockup failures

2009-10-28 Thread Bernd Schubert
On Wednesday 28 October 2009, Andrew Beekhof wrote: > On Wed, Oct 28, 2009 at 2:44 PM, Bernd Schubert > > wrote: > > On Wednesday 28 October 2009, Andrew Beekhof wrote: > >> On Wed, Oct 28, 2009 at 1:05 PM, Bernd Schubert > >> > >> wrote: > >&g

Re: [Pacemaker] server lockup failures

2009-10-28 Thread Bernd Schubert
On Wednesday 28 October 2009, Andrew Beekhof wrote: > On Wed, Oct 28, 2009 at 1:05 PM, Bernd Schubert > > wrote: > > Hello, > > > > I think there is a severe server failure pacemaker doesn't detect. Over > > night a Lustre server failed in shrink_icache_memo

[Pacemaker] server lockup failures

2009-10-28 Thread Bernd Schubert
I think I should be able to reproduce this rather quickly, by adding a wrong dcache_lock into Lustre. The question is now how can we fix this in pacemaker? Thanks, Bernd -- Bernd Schubert DataDirect Networks ___ Pacemaker mailing list Pacemaker@oss.

Re: [Pacemaker] Monitoring a pacemaker cluster

2009-09-15 Thread Bernd Schubert
en resource operations take place. -e, --external-recipient=value A recipient for your program (assuming you want the program to send something to someone). Thanks, Bernd -- Bernd Schubert DataDirect Networks ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Re: [Pacemaker] fence timeouts

2009-07-30 Thread Bernd Schubert
Hello Satomi, On Monday 27 July 2009, Satomi TANIGUCHI wrote: > Hi Bernd, > > With recent Pacemaker, > you can write "stonith-timeout" in each stonith plugin's > to set its timeout value. thanks for your help! I was rather busy during the last days. For now we have it as cluster property, but I

Re: [Pacemaker] stonith suice / chaining stonith agents

2009-07-27 Thread Bernd Schubert
On Thursday 23 July 2009, Florian Haas wrote: > http://clusterlabs.org/wiki/TODO > > * Implement cascading STONITH (If method A fails, try B, etc) > > Scheduled for 1.2, it seems. Unless Andrew has changed his mind. :) Ah, it is called "cascading". Thanks! Cheers, Bernd

Re: [Pacemaker] fence timeouts

2009-07-27 Thread Bernd Schubert
On Friday 24 July 2009, Andrew Beekhof wrote: > On Fri, Jul 24, 2009 at 1:41 AM, Bernd > > Schubert wrote: > > Hello, > > > > I try to increase the fence timeouts, but I as much as I try, I don't > > figure out how that works. > > [snip] > > >

[Pacemaker] fence timeouts

2009-07-23 Thread Bernd Schubert
me timeout --attr-value 300s but this is also not used as default stonith timeout. I really would be glad if someone could tell me which value has the default stonith timeout and how to set timeouts per stonith resource. Thanks in advance, Bernd -- Bernd Schubert DataDirec

[Pacemaker] stonith suice / chaining stonith agents

2009-07-23 Thread Bernd Schubert
find web reference to that (maybe I'm searching for the wrong keywords?). Any ideas? Thanks, Bernd -- Bernd Schubert DataDirect Networks ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker