On 03/28/2012 04:39 PM, Florian Haas wrote:
[...]
Clearly this resource is not running on all nodes, so why is it
being reported as such?
Probably because your resource agent reports OCF_SUCCESS on a probe
operation when it ought to be returning OCF_NOT_RUNNING. Pastebin the
source of ocf:hydra
Hello Brian,
On 08/23/2011 10:56 PM, Brian J. Murrell wrote:
Hi All,
I am trying to configure pacemaker (1.0.10) to make a single filesystem
highly available by two nodes (please don't be distracted by the dangers
of multiply mounted filesystems and clustering filesystems, etc., as I
am absolut
> fs_01_stop_0 (call=74, rc=2, cib-update=90, confirmed=true) invalid
> > parameter
> >
> > Suddenly, it failed to stop!
Yeah, known bug. Frequently comes up with 1.0.9, rarely with 1.0.7 (no idea
about 1.0.8) and supposed to be fixed in 1.0.10. Basically, it mad
On Thursday, October 21, 2010, Rasto Levrinc wrote:
> On Thu, October 21, 2010 12:42 pm, Bernd Schubert wrote:
> > Hi all,
> >
> >
> > is there a better way to detect a failed resource than to run "crm_mon -1
> > -r"?
>
> Well, you could
also no fun and also is not really fast.
So I'm looking for *any* sane way to clean up resources or at least
for a good parse-able way to get failed resources and the corresponding
node.
Thanks,
Bernd
--
Bernd Schubert
DataDirect Networks
S
resources, but still everything is in global pacemaker setup.
We also have syslog-ng rules and a patched logd (patches sent to this list,
need to update them again) to filter out all pacemaker debug logs, so that we
can easily see messages from the lustre RA in syslogs.
Cheers,
Bernd
--
Be
Hello Andrew,
any chance you could add a few lines to the .hg/hgrc of the online repository?
Or to /etc/mercurial/hgrc or /etc/mercurial/hgrc.d?
Reading patches is more easy if function names are provided...
[diff]
git = True
nodates = True
showfunc = True
Thanks,
Bernd
--
Bernd Schubert
cl_log: Only close file descriptors if that had been opened
This patch also could be merged with the 6th patch in the series
(restore old open/write/close semantics). It fixes a valgrind
warning about invalid close().
Signed-off-by: Bernd Schubert
diff --git a/lib/clplumbing/cl_log.c b/lib
ha_logd: Add a SIGHUP signal handler to close/open log files
Without the signal handler cl_log uses inefficient IO, as it
has to open/seek/flush/close the log files in order to allow
cron log file rotation.
Signed-off-by: Bernd Schubert
diff --git a/include/clplumbing/cl_log.h b/include
also uses
system IO (open/close/write) instead of libc IO (fopen/fclose/fwrite).
Libc IO has a buffer, which is not suitable for log files (in case of
a stonith, all the buffer and which might large, will be missing in
log files.
Signed-off-by: Bernd Schubert
diff --git a/lib/clplumbing/cl_log.c
cl_log: Clean up white space
Signed-off-by: Bernd Schubert
diff --git a/lib/clplumbing/cl_log.c b/lib/clplumbing/cl_log.c
--- a/lib/clplumbing/cl_log.c
+++ b/lib/clplumbing/cl_log.c
@@ -161,8 +161,8 @@ cl_log_get_logdtime(void)
void
cl_log_set_logdtime(int logdtime
ha_logd: New option to disable syslog logging
As we already write ha-log and ha-debug, users might want to disable syslog
logging.
Signed-off-by: Bernd Schubert
diff --git a/logd/ha_logd.c b/logd/ha_logd.c
--- a/logd/ha_logd.c
+++ b/logd/ha_logd.c
@@ -91,6 +91,7 @@ static struct {
int
simple
filter rules
Signed-off-by: Bernd Schubert
diff --git a/lib/clplumbing/cl_log.c b/lib/clplumbing/cl_log.c
--- a/lib/clplumbing/cl_log.c
+++ b/lib/clplumbing/cl_log.c
@@ -543,7 +543,7 @@ cl_direct_log(int priority, const char*
int needprivs = !cl_have_full_privs();
if
ha_logd: Use C99 initializers, also correct max entity string length
C99 initializers are more easy to read.
Signed-off-by: Bernd Schubert
diff --git a/logd/ha_logd.c b/logd/ha_logd.c
--- a/logd/ha_logd.c
+++ b/logd/ha_logd.c
@@ -87,18 +87,18 @@ static gboolean needs_shutdown = FALSE;
static
cl_log: Simplify a function
Signed-off-by: Bernd Schubert
diff --git a/lib/clplumbing/cl_log.c b/lib/clplumbing/cl_log.c
--- a/lib/clplumbing/cl_log.c
+++ b/lib/clplumbing/cl_log.c
@@ -545,7 +545,7 @@ cl_direct_log(int priority, const char*
entity =cl_log_entity
cl_log: Make functions static and remove CircularBuffer
CircularBuffer was added more than 5 years ago and still it is not used.
So remove dead code, it can be retrieved from the repository history
if required.
Also make functions static only used with cl_log.c
Signed-off-by: Bernd Schubert
Hi all,
the following patches are to better handle bug 2470 and have some generic
improvements. I'm not sure if I shall attach it to the bugzilla or if the
mailing list is preferred.
Thanks,
Bernd
--
Bernd Schubert
DataDirect Net
up-by-node.
Actually we use a wrapper that calls "crm_mon -1 -r -n" to give us the
cluster status. Besides the so far missing "unmanaged" flag, "FAILED" is also
an important missing information.
Thanks,
Bernd
(black box), that provides for
example NFS to clients. You would want to have each and every additional
service mirrored again. And you could not rely on additional customer NFS
clients.
>
> May be easier, safer, and more transparent than
> no-quorum=ignore plus some ping attribute bas
On Thursday, September 02, 2010, Lars Ellenberg wrote:
> On Thu, Sep 02, 2010 at 11:00:12AM +0200, Bernd Schubert wrote:
> > On Thursday, September 02, 2010, Andrew Beekhof wrote:
> > > On Wed, Sep 1, 2010 at 11:59 AM, Bernd Schubert
> > >
> > > > My proposa
On Thursday, September 02, 2010, Andrew Beekhof wrote:
> On Wed, Sep 1, 2010 at 11:59 AM, Bernd Schubert
> > My proposal is to rip out all network code out of pingd and to add
> > slightly modified files from 'iputils'.
>
> Close, but thats not portable.
> In
ping RA: The host list must be provided
While pingd allows to connect to heartbeat to get all
peer nodes, the ping script RA cannot do that.
Accordingly the hostlist is a required argument.
Signed-off-by: Bernd Schubert
diff --git a/extra/resources/ping b/extra/resources/ping
--- a/extra
on, as pingd.c includes a function from
iputils ping. While the function is marked accordingly, it still does not
include the original license statement, which is IMHO a clear license
violation.
I could probably do that quicky, but don't want to do something that is not
accepted upst
This is virtual machine
test cluster and I recent renamed all host names. But used the old host names
for the location :( I think we should add a warning message to crm shell if
location host name is used, which is not defined in the cluster.
Sorry again and thanks f
Sorry for the double post and while I'm reading my own mail, I found it, I
used the wrong host names in the location constraints :( That also explains
why it worked on another cluster.
Sorry for the noise,
Bernd
On Thursday, August 26, 2010, Bernd Schubert wrote:
> Hi all,
>
>
reciated.
Thanks,
Bernd
--
Bernd Schubert
DataDirect Networks
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http:
Hi all,
I'm trying to start a pingd clone resource on an asymmetric cluster.
I specified locations, but it still refuses to start pingd
===
[r...@vrhel5-mds1 ha.d]# cat pingd.cib
primitive pingdnet1 ocf:pacemaker:pingd
\
params h
then run into
random issues all the time...).
Cheers,
Bernd
--
Bernd Schubert
DataDirect Networks
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http:/
Hello all,
is there a way to overwrite the quorum policy decision, lets say to
"no quorum with n/2 - 1 nodes" or "no quorum if no access to any other node"?
Thanks,
Bernd
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterl
e: Sending
flush op to all hosts for: last-failure-ost_demofs_0
(1278886216)
I guess I need to fill a bugzilla, but I won't have time before Wednesday.
Thanks,
Bernd
--
Bernd Schubert
DataDirect Networks
___
Pacemaker mailing
How can it happen that parameters are missing in 1.0.9?
The following condition is *sometimes* triggered (in our lustre_server agent,
which is is a modified Filesystem agent)
# It is possible that OCF_RESKEY_directory has one or even multiple trailing
"/".
# But the output of `mount` and /proc/
Hello all,
after the update 1.0.9 on our test cluster, new weird stonith issues
come up.
1) It fails to start stonith resources on *some* nodes
===
Jul 02 14:43:23 phys-oss3 pengine: [18077]: WARN: unpack_rsc_op: Processing
failed op st-rilo
Never mind, seems to be fixed in 1.0.9
Thanks,
Bernd
On Thursday, July 01, 2010, Bernd Schubert wrote:
> Hi all,
>
> there seems to be a new regression in pacemaker-1.0.8 (or cluster-glue
> or whatever, really difficult to differentiate the layers).
>
> ul 01 15:04:37 phys-
tarted" is-managed="true"
Shall I open a bug entry and attach hb_report or is it a know issue?
Thanks,
Bernd
--
Bernd Schubert
DataDirect Networks
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clus
On Tuesday 15 June 2010, Dejan Muhamedagic wrote:
> Hi,
>
> On Tue, Jun 15, 2010 at 02:25:51PM -0600, Dan Urist wrote:
> > On Tue, 15 Jun 2010 22:08:37 +0200
> >
> > Dejan Muhamedagic wrote:
> > > Hi,
> > >
> > > On Tue, Jun 15, 2010 at 01:15:08PM -0600, Dan Urist wrote:
> > > > I've recently had
ONFIDENTIAL AND/OR OTHERWISE PROPRIETARY
> MATERIAL and is thus for use only by the intended recipient. If you
> received this in error, please contact the sender and delete the e-mail
> and its attachments from all computers.
>
>
> -Original Message-
On Tuesday 15 June 2010, Schaefer, Diane E wrote:
> Hi,
> We are having trouble with our two node cluster after one node
> experiences an abrupt power failure. The resources do not seem to start
> on the remaining node (ie DRBD resources do not promote to master). In
> the log we notice:
>
Hello Dejan,
On Wednesday 30 December 2009, Dejan Muhamedagic wrote:
> Hi,
>
> On Wed, Dec 30, 2009 at 01:31:27PM +0100, Bernd Schubert wrote:
> > Hello Dejan,
> >
> > On Thursday 24 December 2009, Dejan Muhamedagic wrote:
> >
> > No, without Multiple
; file a bugzilla if the RA does something unexpected.
The Filesystem agent behaves correctly, just Lustre must not claim the device
is umounted although it is not. One of these bugs will be fixed in the next
Lustre release and another one I still need to analyze.
That is why one should us
are those annoying bugs that tell you the device is umounted
although it is not. My lustre server agent, which I will submit here once I
find some time to review it again, will protect you from this. I least I hope
I did catch all Lustre bugs...
And then pacemaker does not protect you to moun
On Thursday 12 November 2009, Andrew Beekhof wrote:
> On Thu, Nov 12, 2009 at 11:54 AM, Bernd Schubert
>
> wrote:
> > Hello,
> >
> > I try to prevent auto-migration back from mds2 to mds1, but somehow
> > resource- stickiness doesn't seem to work. After a fa
s3
location location-MDT_HC3WORK.oss4 MDT_HC3WORK -inf: oss4
MDT-HC3WORK is also part of a resource group, but the resource
\
userid=root passwd=password interface=lanplus \
min_off_time=60 off_time=60 on_time=120 \
op monitor interval=600 timeout=240
It will reset "server1" using the IPMI-IP "ipmi-ip_of_server_1".
--
Bernd Schubert
DataDirect Networks
___
On Friday 30 October 2009, Lars Marowsky-Bree wrote:
> On 2009-10-29T09:58:13, Andrew Beekhof wrote:
> > > Heartbeat based, I still didn't have the time to look into openais.
> >
> > I guess heartbeat wasn't hung then... otherwise it would have stopped
> > sending "i'm here" packets (and dropped o
On Friday 30 October 2009, Lars Marowsky-Bree wrote:
> On 2009-10-29T09:58:13, Andrew Beekhof wrote:
> > > Heartbeat based, I still didn't have the time to look into openais.
> >
> > I guess heartbeat wasn't hung then... otherwise it would have stopped
> > sending "i'm here" packets (and dropped o
On Wednesday 28 October 2009, Andrew Beekhof wrote:
> On Wed, Oct 28, 2009 at 2:44 PM, Bernd Schubert
>
> wrote:
> > On Wednesday 28 October 2009, Andrew Beekhof wrote:
> >> On Wed, Oct 28, 2009 at 1:05 PM, Bernd Schubert
> >>
> >> wrote:
> >&g
On Wednesday 28 October 2009, Andrew Beekhof wrote:
> On Wed, Oct 28, 2009 at 1:05 PM, Bernd Schubert
>
> wrote:
> > Hello,
> >
> > I think there is a severe server failure pacemaker doesn't detect. Over
> > night a Lustre server failed in shrink_icache_memo
I think I should be able to reproduce this rather quickly, by adding a wrong
dcache_lock into Lustre. The question is now how can we fix this in pacemaker?
Thanks,
Bernd
--
Bernd Schubert
DataDirect Networks
___
Pacemaker mailing list
Pacemaker@oss.
en resource operations take
place.
-e, --external-recipient=value A recipient for your program (assuming you
want the program to send something to someone).
Thanks,
Bernd
--
Bernd Schubert
DataDirect Networks
___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Hello Satomi,
On Monday 27 July 2009, Satomi TANIGUCHI wrote:
> Hi Bernd,
>
> With recent Pacemaker,
> you can write "stonith-timeout" in each stonith plugin's
> to set its timeout value.
thanks for your help! I was rather busy during the last days. For now we have
it as cluster property, but I
On Thursday 23 July 2009, Florian Haas wrote:
> http://clusterlabs.org/wiki/TODO
>
> * Implement cascading STONITH (If method A fails, try B, etc)
>
> Scheduled for 1.2, it seems. Unless Andrew has changed his mind. :)
Ah, it is called "cascading". Thanks!
Cheers,
Bernd
On Friday 24 July 2009, Andrew Beekhof wrote:
> On Fri, Jul 24, 2009 at 1:41 AM, Bernd
>
> Schubert wrote:
> > Hello,
> >
> > I try to increase the fence timeouts, but I as much as I try, I don't
> > figure out how that works.
>
> [snip]
>
> >
me timeout --attr-value 300s
but this is also not used as default stonith timeout.
I really would be glad if someone could tell me which value has the default
stonith timeout and how to set timeouts per stonith resource.
Thanks in advance,
Bernd
--
Bernd Schubert
DataDirec
find web reference to that (maybe I'm searching for the wrong
keywords?).
Any ideas?
Thanks,
Bernd
--
Bernd Schubert
DataDirect Networks
___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
54 matches
Mail list logo