Currently I am using a ping resource to ensure that other windows VM's don't
start up until the domain controllers are started. This helps prevent things
like Exchange not starting properly because they can't find a domain controller.
On a complete network shutdown (eg a long power failure), win
>
> On 2012-12-20T05:14:56, James Harper
> wrote:
>
> > I have a resource that returned 'not installed' because (I think) I had
> forgotten to install the required package. I've installed the package now but
> I
> still see the following every 15 min
> >
> > Any cib change throws the system load up for 10-20 seconds, and then
> > things start timing out, despite having set the timeouts well in excess of
> > the
> > time it takes for pacemaker to mark the resource as timed out.
>
> Hmm, unless your CIB (the configuration) is really huge, that
> Hi,
>
> On Tue, Dec 18, 2012 at 10:58:18AM +0000, James Harper wrote:
> > For the following failure:
> >
> > Failed actions:
> > p_lvm_iscsi:0_monitor_1 (node=bitvs6, call=57, rc=-2,
> > status=Timed Out): unknown exec error
> >
> >
I have a resource that returned 'not installed' because (I think) I had
forgotten to install the required package. I've installed the package now but I
still see the following every 15 minutes:
Preventing ocfs2mgmt from re-starting on node1: operation monitor failed 'not
installed' (rc=5)
As f
>
> What is the behaviour of a cluster when the nodes are up to 10 minutes out
> of sync with each other, because they've just been booted up after a crash
> and the hwclocks are out of date and there is no ntp time source reachable?
> Could it cause lots of sig11's and constant re-elections becau
What is the behaviour of a cluster when the nodes are up to 10 minutes out of
sync with each other, because they've just been booted up after a crash and the
hwclocks are out of date and there is no ntp time source reachable? Could it
cause lots of sig11's and constant re-elections because that'
For the following failure:
Failed actions:
p_lvm_iscsi:0_monitor_1 (node=bitvs6, call=57, rc=-2, status=Timed
Out): unknown exec error
Is this the ra itself returning a "Timed Out" error, or is it the cluster
software determining that the ra is taking too long and so killing it and
dec
Does anyone have an ra that allows a stonith action to disable the ports on a
switch for a node? Obviously this would only allow a "shutdown" action not a
"reboot" action, but in the absence of anything else it would definitely ensure
that the errant node was ejected from the network. I guess re
I've been having a problem with the lvm ra when used in conjunction with clvm
when a node dies (eg when I destroy the vm to test this particular scenario)
clvm re-organises itself just fine, and comes good well within the lvm ra
timeout I set (60 seconds), but if the "vgdisplay -v vg-drbd" comma
I'm using external/ssh in my test cluster (a bunch of vm's), and for some
reason the cluster has tried to terminate it but failed, like:
ct 14 15:54:45 ctest0 stonith-ng: [2006]: info: call_remote_stonith: Requesting
that ctest2 perform op off ctest1
Oct 14 15:54:45 ctest0 stonith-ng: [2006]: in
> -Original Message-
> From: James Harper [mailto:james.har...@bendigoit.com.au]
> Sent: Saturday, 13 October 2012 1:48 AM
> To: pacemaker@oss.clusterlabs.org
> Subject: [Pacemaker] node utilization error
>
> When I try and set the utilization on a node I get
When I try and set the utilization on a node I get this:
# crm node utilization node5 set memory 3
Error setting memory=3 (section=nodes, set=nodes-node5-utilization): Update
does not conform to the configured schema/DTD
Error performing operation: Update does not conform to the configure
FWIW, I'm running ocfs2 and looking through the logs a bit more my symptoms
seem to match those discussed here -
http://www.mentby.com/Group/linux-ha/crmd-31942-warn-decodetransitionkey-bad-uuid-crm-resource-25438-in-sscanf-result-3-for-00crm-resource-25438.html
And my test cluster (built on a b
>
> Questions:
> - are you making any config changes when this behaviour is occurring?
> - if so, from one node only or many?
> - what version is this? 1.1.7 or 1.1.7 plus some debian patches? which
> patches?
>
One other thing I just noticed is that ntp wasn't working on the two nodes used
fo
> > I guess I'd first like to know if the log entries I was seeing ("Failed
> > application of an update diff" and "Requesting re-sync from peer") means
> > that a full resync is being done, and if that's a problem or not.
>
> There are occasions when its not a problem, but I don't think any of th
> On 10/09/2012 01:42 PM, James Harper wrote:
> > As per previous post, I'm seeing very high cib load whenever I make a
> > configuration change, enough load that things timeout seemingly
> > instantly. I thought this was happening well before the configured
> >
As per previous post, I'm seeing very high cib load whenever I make a
configuration change, enough load that things timeout seemingly instantly. I
thought this was happening well before the configured timeout but now I'm not
so sure, maybe the timeouts are actually working okay and it just seems
This is just a cosmetic thing, but I have a cloned resource that shows up like
this:
Clone Set: c_ping_X [p_ping_X]
Started: [ node1 node2 ]
Stopped: [ p_ping_X:0 p_ping_X:1 p_ping_X:4 ]
That's correct in that the location restriction on the clone set c_ping_X only
allows it run on no
> Hi,
>
> On Wed, Oct 03, 2012 at 10:07:06PM +0000, James Harper wrote:
> > It seems like everytime I modify a resource, things start timing out. Just
> > now I changed the location of where a ping resource could run and this
> > happened:
> > Oct 4 0
> > > > rule 0: ping_gw eq 0
> > > > rule -inf: #uname ne node1 and #name ne node2
> > > >
> > > > so the first rule says the resource can run on any location if the
> > > > gateway
> > > can't be pinged, and the second rule says it can't run on any node
> > > except
> > > node1 and node2. It's an
> - Original Message -
> > From: "James Harper"
> > To: pacemaker@oss.clusterlabs.org
> > Sent: Wednesday, October 3, 2012 5:04:53 PM
> > Subject: [Pacemaker] examine ping result
> >
> > I have a ping resource defined like:
It seems like everytime I modify a resource, things start timing out. Just now
I changed the location of where a ping resource could run and this happened:
Oct 4 07:07:07 bitvs5 lrmd: [3681]: WARN: perform_ra_op: the operation
monitor[52] on p_lvm_iscsi:0 for client 3686 stayed in operation lis
I have a ping resource defined like:
primitive p_ping_test ocf:pacemaker:ping \
params name="ping_test" host_list="192.168.200.253" \
op monitor interval="10s" timeout="60s" \
op start interval="0" timeout="60s"
How can I examine the result of "ping_test"?
Thanks
James
> > Just a thought: You could derive your own RA from the VirtualDomain-RA
> > and implement your own migration logic: If target- and
> > source-platforms match, do the live migration, if they don't match
> > make migrate_to() call
> > stop() and migrate_from() call start().
>
> We're still accep
>
> On Mon, Oct 1, 2012 at 5:49 PM, James Harper
> wrote:
> >>
> >> On 2012-09-28 16:24, James Harper wrote:
> >> > I have two nodes running identical hardware which run Xen VM's, and
> >> > want
> >> to add a third node to the clu
I am trying to configure stonith. ipmilan was a complete disaster as it kept
passing port 623 as the hostname, so now I'm trying external/ipmi.
The stonith resource is running as expected, but I still have
stonith-enabled="false" in the cluster configuration. To test before I turn
stonith on p
>
> Hi,
>
> On Sun, Sep 30, 2012 at 05:52:58AM +, James Harper wrote:
> > >
> > > rule ping_gw eq 0 and (#uname eq bitvs5 or #uname eq bitvs6)
> > >
> > > and it tells me it doesn't like "and" and "or" in the same
>
> On 2012-09-28 16:24, James Harper wrote:
> > I have two nodes running identical hardware which run Xen VM's, and want
> to add a third node to the cluster which can access the same clvm and iscsi
> resources, but it will not be identical hardware. The non-identical h
I just tried to join two machines to a cluster that were previously members of
another cluster. One worked fine, but the other seems to have some problems.
"crm status" says everything is offline, but the other nodes all see it as
online and resources actually seem to start. It seems to say this
>
> rule ping_gw eq 0 and (#uname eq bitvs5 or #uname eq bitvs6)
>
> and it tells me it doesn't like "and" and "or" in the same expression (or I
> think
> that's what it's telling me)
>
> any suggestions?
>
I think I've figured it out, I need to do it like:
rule 0: ping_gw eq 0
rule -inf: #u
My main router is a Xen VM managed by pacemaker. This causes problems when
working from home - just lately the Xen resource agent has been a bit
temperamental (see other post) and also a new node I'm adding causes clvm to
freeze and the network connection is subsequently dropped requiring a driv
I have a xen resource with op's configured like:
op stop interval="0" timeout="300s" \
op migrate_from interval="0" timeout="300s" \
op migrate_to interval="0" timeout="300s" \
op monitor interval="10s" timeout="90s" \
but it seems like every time I make any change
I have two nodes running identical hardware which run Xen VM's, and want to add
a third node to the cluster which can access the same clvm and iscsi resources,
but it will not be identical hardware. The non-identical hardware means that to
move a VM to this third node it it must be stopped then
Further to my last email, I now want to create an ordering rule that says
"exchange server should only start when at least one domain controller is
already started", but I can't see how to do this using crm. There is a syntax
using braces, eg:
order dc_then_exch 0: (dc1 dc2) exch
but I think t
>
> On 09/05/2012 04:08 PM, James Harper wrote:
> > A power failure tonight indicated that my clustered resources (xen vm's)
> have a dependency requirement like "make sure at least one domain
> controller VM is fully up and running before starting any other windo
>
> On Thu, Sep 6, 2012 at 12:08 AM, James Harper
> wrote:
> > A power failure tonight indicated that my clustered resources (xen vm's)
> have a dependency requirement like "make sure at least one domain
> controller VM is fully up and running before start
A power failure tonight indicated that my clustered resources (xen vm's) have a
dependency requirement like "make sure at least one domain controller VM is
fully up and running before starting any other windows servers". Determining a
status of "fully up and running" is probably complex so as a
Attn listadmins - Linkin are more than happy to remove mailing list addresses
from their auto-invite-everyone-in-my-contacts thing, you just need to email
them and tell them what addresses to exclude.
> -Original Message-
> From: Kavan Smith via LinkedIn [mailto:mem...@linkedin.com]
> Se
LinkedIn have a facility to exclude mailing list addresses from the "email
everyone in my contacts and tell them about linkedin" function, you just need
to tell them what those addresses are. Sounds like a job for the list admins...
James
> -Original Message-
> From: Marcio Ribeiro [mai
I have a pair of servers running Xen, with the xen config files stored
on an ocfs2 share mounted on an iscsi volume. A problem has developed
where ocfs2 seems to get stuck (in the monitor script I think), and
because I have a dependency of xen vm's depending on the volume where
the config files are
> >
> > That thread goes around in circles and completely contradicts what
I'm
> > seeing. What I'm seeing is that unmanaged resources are never
monitored.
>
> would be strange and how do you verify this? A look at your config may
also
> help to shed some light on this ...
>
The relevant portion
>
> On 11/29/2011 11:30 AM, James Harper wrote:
> > Is pacemaker expected to monitor stopped and/or unmanaged resources?
> > I'm guessing not because that's what I've observed but I can't find
> > the behaviour documented anywhere so maybe there is a c
Is pacemaker expected to monitor stopped and/or unmanaged resources? I'm
guessing not because that's what I've observed but I can't find the
behaviour documented anywhere so maybe there is a config option I can
tweak?
Thanks
James
___
Pacemaker mailin
44 matches
Mail list logo