Re: [Pacemaker] Question about the resource to fence a node

2013-11-12 Thread Andrew Beekhof
On 16 Oct 2013, at 8:51 am, Andrew Beekhof wrote: > > On 15/10/2013, at 8:24 PM, Kazunori INOUE wrote: > >> Hi, >> >> I'm using pacemaker-1.1 (the latest devel). >> I started resource (f1 and f2) which fence vm3 on vm1. >> >> $ crm_mon -1 >> Last updated: Tue Oct 15 15:16:37 2013 >> Last ch

Re: [Pacemaker] Network outage debugging

2013-11-12 Thread Andrew Beekhof
On 13 Nov 2013, at 11:49 am, Sean Lutner wrote: > > >> On Nov 12, 2013, at 7:33 PM, Andrew Beekhof wrote: >> >> >>> On 13 Nov 2013, at 11:22 am, Sean Lutner wrote: >>> >>> >>> On Nov 12, 2013, at 6:01 PM, Andrew Beekhof wrote: > On 13 Nov 2013, at 6:10 am, Sean Lut

Re: [Pacemaker] Network outage debugging

2013-11-12 Thread Sean Lutner
> On Nov 12, 2013, at 7:33 PM, Andrew Beekhof wrote: > > >> On 13 Nov 2013, at 11:22 am, Sean Lutner wrote: >> >> >> >>> On Nov 12, 2013, at 6:01 PM, Andrew Beekhof wrote: >>> >>> On 13 Nov 2013, at 6:10 am, Sean Lutner wrote: The folks testing the cluster I've been bui

Re: [Pacemaker] Network outage debugging

2013-11-12 Thread Andrew Beekhof
On 13 Nov 2013, at 11:22 am, Sean Lutner wrote: > > >> On Nov 12, 2013, at 6:01 PM, Andrew Beekhof wrote: >> >> >>> On 13 Nov 2013, at 6:10 am, Sean Lutner wrote: >>> >>> The folks testing the cluster I've been building have run a script which >>> blocks all traffic except SSH on one nod

Re: [Pacemaker] asymmetric clusters, remote nodes, and monitor operations

2013-11-12 Thread Andrew Beekhof
On 12 Sep 2013, at 3:44 am, Lindsay Todd wrote: > What I am seeing in the syslog are messages like: > > Sep 11 13:19:52 db02 pacemaker_remoted[1736]: notice: operation_finished: > p-my > sql_monitor_2:19398:stderr [ 2013/09/11_13:19:52 INFO: MySQL monitor > succeed > ed ] > Sep 11 13:20

Re: [Pacemaker] Network outage debugging

2013-11-12 Thread Sean Lutner
> On Nov 12, 2013, at 6:01 PM, Andrew Beekhof wrote: > > >> On 13 Nov 2013, at 6:10 am, Sean Lutner wrote: >> >> The folks testing the cluster I've been building have run a script which >> blocks all traffic except SSH on one node of the cluster for 15 seconds to >> mimic a network failure

Re: [Pacemaker] The larger cluster is tested.

2013-11-12 Thread Andrew Beekhof
Did you look at the load numbers in the logs? The CPUs are being slammed for over 20 minutes. The automatic tuning can only help so much, you're simply asking the cluster to do more work than it is capable of. Giving more priority to cib operations the come via IPC is one option, but as I explai

Re: [Pacemaker] why pacemaker does not control the resources

2013-11-12 Thread Andrew Beekhof
On 12 Nov 2013, at 4:42 pm, Andrey Groshev wrote: > > > 11.11.2013, 03:44, "Andrew Beekhof" : >> On 8 Nov 2013, at 7:49 am, Andrey Groshev wrote: >> >>> Hi, PPL! >>> I need help. I do not understand... Why has stopped working. >>> This configuration work on other cluster, but on corosync1

Re: [Pacemaker] Follow up: Colocation constraint to External Managed Resource (cluster-recheck-interval="5m" ignored after 1.1.10 update?)

2013-11-12 Thread Andrew Beekhof
On 13 Nov 2013, at 12:06 am, Robert H. wrote: > Hello, > > for PaceMaker 1.1.8 (CentOS Version) the thread > http://www.mail-archive.com/pacemaker@oss.clusterlabs.org/msg18048.html was > solved with adding cluster-recheck-interval="5m", causing the LRM Its the policy engine btw. Not the lrm

Re: [Pacemaker] Network outage debugging

2013-11-12 Thread Andrew Beekhof
On 13 Nov 2013, at 6:10 am, Sean Lutner wrote: > The folks testing the cluster I've been building have run a script which > blocks all traffic except SSH on one node of the cluster for 15 seconds to > mimic a network failure. During this time, the network being "down" seems to > cause some od

Re: [Pacemaker] recover cib from raw file

2013-11-12 Thread Lars Marowsky-Bree
On 2013-11-12T09:51:02, "s.oreilly" wrote: > Brilliant, thanks Andrew. I was looking for a pcs option. Should have thought > about cibadmin. Hopefully I will never break things badly enough to have to > use > it :-) crm configure load xml ... ;-) Regards, Lars -- Architect Storage/HA SU

[Pacemaker] Network outage debugging

2013-11-12 Thread Sean Lutner
The folks testing the cluster I've been building have run a script which blocks all traffic except SSH on one node of the cluster for 15 seconds to mimic a network failure. During this time, the network being "down" seems to cause some odd behavior from pacemaker resulting in it dying. The clus

Re: [Pacemaker] DRBD promotion timeout after pacemaker stop on other node

2013-11-12 Thread Vladislav Bogdanov
12.11.2013 09:56, Vladislav Bogdanov wrote: ... > Ah, then in_ccm will be set to false only when corosync (2) is stopped > on a node, not when pacemaker is stopped? > > Thus, current drbd agent/fencing logic does not (well) support just stop > of pacemaker in my use-case, messaging layer should be

[Pacemaker] Follow up: Colocation constraint to External Managed Resource (cluster-recheck-interval="5m" ignored after 1.1.10 update?)

2013-11-12 Thread Robert H.
Hello, for PaceMaker 1.1.8 (CentOS Version) the thread http://www.mail-archive.com/pacemaker@oss.clusterlabs.org/msg18048.html was solved with adding cluster-recheck-interval="5m", causing the LRM to be executed every 5 minutes and detecting externally managed resources as started (in this ca

Re: [Pacemaker] recover cib from raw file

2013-11-12 Thread Andrew Beekhof
I wouldn't be surprised to see a relevant pcs command in the future ;-) On 12 Nov 2013, at 8:51 pm, s.oreilly wrote: > Brilliant, thanks Andrew. I was looking for a pcs option. Should have thought > about cibadmin. Hopefully I will never break things badly enough to have to > use > it :-) > >

Re: [Pacemaker] recover cib from raw file

2013-11-12 Thread s.oreilly
Brilliant, thanks Andrew. I was looking for a pcs option. Should have thought about cibadmin. Hopefully I will never break things badly enough to have to use it :-) Regards Sean O'Reilly On Mon 11/11/13 10:03 PM , "Andrew Beekhof" and...@beekhof.net sent: > > On 11 Nov 2013, at 9:41 pm, s.orei