Hi Andrew,
The corosync.conf is configured as follows:
> service {
> # Load the Pacemaker Cluster Resource Manager
> name: pacemaker
> ver: 0
> }
and pacemaker is not started via service pacemaker start…
here is the extract from the logs with extra debug when attempting to start
corosync/pacemaker..
06:59:20 corosync [MAIN ] Corosync Cluster Engine ('1.4.1'): started and
ready to provide service.
06:59:20 corosync [MAIN ] Corosync built-in features: nss dbus rdma snmp
06:59:20 corosync [MAIN ] Successfully read main configuration file
'/etc/corosync/corosync.conf'.
06:59:20 corosync [TOTEM ] waiting_trans_ack changed to 1
06:59:20 corosync [TOTEM ] Token Timeout (5000 ms) retransmit timeout (247 ms)
06:59:20 corosync [TOTEM ] token hold (187 ms) retransmits before loss (20
retrans)
06:59:20 corosync [TOTEM ] join (1000 ms) send_join (0 ms) consensus (7500 ms)
merge (200 ms)
06:59:20 corosync [TOTEM ] downcheck (1000 ms) fail to recv const (2500 msgs)
06:59:20 corosync [TOTEM ] seqno unchanged const (30 rotations) Maximum
network MTU 1402
06:59:20 corosync [TOTEM ] window size per rotation (50 messages) maximum
messages per rotation (20 messages)
06:59:20 corosync [TOTEM ] missed count const (5 messages)
06:59:20 corosync [TOTEM ] send threads (0 threads)
06:59:20 corosync [TOTEM ] RRP token expired timeout (247 ms)
06:59:20 corosync [TOTEM ] RRP token problem counter (2000 ms)
06:59:20 corosync [TOTEM ] RRP threshold (10 problem count)
06:59:20 corosync [TOTEM ] RRP multicast threshold (100 problem count)
06:59:20 corosync [TOTEM ] RRP automatic recovery check timeout (1000 ms)
06:59:20 corosync [TOTEM ] RRP mode set to none.
06:59:20 corosync [TOTEM ] heartbeat_failures_allowed (0)
06:59:20 corosync [TOTEM ] max_network_delay (50 ms)
06:59:20 corosync [TOTEM ] HeartBeat is Disabled. To enable set
heartbeat_failures_allowed > 0
06:59:20 corosync [TOTEM ] Initializing transport (UDP/IP Multicast).
06:59:20 corosync [TOTEM ] Initializing transmit/receive security: libtomcrypt
SOBER128/SHA1HMAC (mode 0).
06:59:20 corosync [IPC ] you are using ipc api v2
06:59:20 corosync [TOTEM ] Receive multicast socket recv buffer size (320000
bytes).
06:59:20 corosync [TOTEM ] Transmit multicast socket send buffer size (320000
bytes).
06:59:20 corosync [TOTEM ] Local receive multicast loop socket recv buffer
size (320000 bytes).
06:59:20 corosync [TOTEM ] Local transmit multicast loop socket send buffer
size (320000 bytes).
06:59:20 corosync [TOTEM ] The network interface [10.87.79.59] is now up.
06:59:20 corosync [TOTEM ] Created or loaded sequence id 6984.10.87.79.59 for
this ring.
Set r/w permissions for uid=0, gid=0 on /var/log/corosync.log
06:59:20 corosync [pcmk ] Logging: Initialized pcmk_startup
Set r/w permissions for uid=0, gid=0 on /var/log/corosync.log
06:59:20 corosync [SERV ] Service engine loaded: Pacemaker Cluster Manager
1.1.6
06:59:20 corosync [pcmk ] Logging: Initialized pcmk_startup
06:59:20 corosync [SERV ] Service engine loaded: Pacemaker Cluster Manager
1.1.6
06:59:20 corosync [SERV ] Service engine loaded: corosync extended virtual
synchrony service
06:59:20 corosync [SERV ] Service engine loaded: corosync configuration
service
06:59:20 corosync [SERV ] Service engine loaded: corosync cluster closed
process group service v1.01
06:59:20 corosync [SERV ] Service engine loaded: corosync cluster config
database access v1.01
06:59:20 corosync [SERV ] Service engine loaded: corosync profile loading
service
06:59:20 corosync [SERV ] Service engine loaded: corosync cluster quorum
service v0.1
06:59:20 corosync [MAIN ] Compatibility mode set to whitetank. Using V1 and
V2 of the synchronization engine.
06:59:20 corosync [TOTEM ] entering GATHER state from 15.
06:59:20 corosync [TOTEM ] Creating commit token because I am the rep.
06:59:20 corosync [TOTEM ] Saving state aru 0 high seq received 0
06:59:20 corosync [TOTEM ] Storing new sequence id for ring 1b4c
06:59:20 corosync [TOTEM ] entering COMMIT state.
06:59:20 corosync [TOTEM ] got commit token
06:59:20 corosync [TOTEM ] entering RECOVERY state.
06:59:20 corosync [TOTEM ] position [0] member 10.87.79.59:
06:59:20 corosync [TOTEM ] previous ring seq 6984 rep 10.87.79.59
06:59:20 corosync [TOTEM ] aru 0 high delivered 0 received flag 1
06:59:20 corosync [TOTEM ] Did not need to originate any messages in recovery.
06:59:20 corosync [TOTEM ] got commit token
06:59:20 corosync [TOTEM ] Sending initial ORF token
06:59:20 corosync [TOTEM ] token retrans flag is 0 my set retrans flag0
retrans queue empty 1 count 0, aru 0
06:59:20 corosync [TOTEM ] install seq 0 aru 0 high seq received 0
06:59:20 corosync [TOTEM ] token retrans flag is 0 my set retrans flag0
retrans queue empty 1 count 1, aru 0
06:59:20 corosync [TOTEM ] install seq 0 aru 0 high seq received 0
06:59:20 corosync [TOTEM ] token retrans flag is 0 my set retrans flag0
retrans queue empty 1 count 2, aru 0
06:59:20 corosync [TOTEM ] install seq 0 aru 0 high seq received 0
06:59:20 corosync [TOTEM ] token retrans flag is 0 my set retrans flag0
retrans queue empty 1 count 3, aru 0
06:59:20 corosync [TOTEM ] install seq 0 aru 0 high seq received 0
06:59:20 corosync [TOTEM ] retrans flag count 4 token aru 0 install seq 0 aru
0 0
06:59:20 corosync [TOTEM ] Resetting old ring state
06:59:20 corosync [TOTEM ] recovery to regular 1-0
06:59:20 corosync [TOTEM ] waiting_trans_ack changed to 1
06:59:20 corosync [SYNC ] This node is within the primary component and will
provide service.
06:59:20 corosync [TOTEM ] entering OPERATIONAL state.
06:59:20 corosync [TOTEM ] A processor joined or left the membership and a new
membership was formed.
06:59:20 corosync [SYNC ] confchg entries 1
06:59:20 corosync [SYNC ] Barrier Start Received From 1003428268
06:59:20 corosync [SYNC ] Barrier completion status for nodeid 1003428268 =
1.
06:59:20 corosync [SYNC ] Synchronization barrier completed
06:59:20 corosync [SYNC ] Synchronization actions starting for (dummy CLM
service)
06:59:20 corosync [SYNC ] confchg entries 1
06:59:20 corosync [SYNC ] Barrier Start Received From 1003428268
06:59:20 corosync [SYNC ] Barrier completion status for nodeid 1003428268 =
1.
06:59:20 corosync [SYNC ] Synchronization barrier completed
06:59:20 corosync [SYNC ] Committing synchronization for (dummy CLM service)
06:59:20 corosync [SYNC ] Synchronization actions starting for (dummy AMF
service)
06:59:20 corosync [SYNC ] confchg entries 1
06:59:20 corosync [SYNC ] Barrier Start Received From 1003428268
06:59:20 corosync [SYNC ] Barrier completion status for nodeid 1003428268 =
1.
06:59:20 corosync [SYNC ] Synchronization barrier completed
06:59:20 corosync [SYNC ] Committing synchronization for (dummy AMF service)
06:59:20 corosync [SYNC ] Synchronization actions starting for (dummy CKPT
service)
06:59:20 corosync [SYNC ] confchg entries 1
06:59:20 corosync [SYNC ] Barrier Start Received From 1003428268
06:59:20 corosync [SYNC ] Barrier completion status for nodeid 1003428268 =
1.
06:59:20 corosync [SYNC ] Synchronization barrier completed
06:59:20 corosync [SYNC ] Committing synchronization for (dummy CKPT service)
06:59:20 corosync [SYNC ] Synchronization actions starting for (dummy EVT
service)
06:59:20 corosync [SYNC ] confchg entries 1
06:59:20 corosync [SYNC ] Barrier Start Received From 1003428268
06:59:20 corosync [SYNC ] Barrier completion status for nodeid 1003428268 =
1.
06:59:20 corosync [SYNC ] Synchronization barrier completed
06:59:20 corosync [SYNC ] Committing synchronization for (dummy EVT service)
06:59:20 corosync [SYNC ] Synchronization actions starting for (corosync
cluster closed process group service v1.01)
06:59:20 corosync [CPG ] comparing: sender r(0) ip(10.87.79.59) ;
members(old:0 left:0)
06:59:20 corosync [CPG ] chosen downlist: sender r(0) ip(10.87.79.59) ;
members(old:0 left:0)
06:59:20 corosync [SYNC ] confchg entries 1
06:59:20 corosync [SYNC ] Barrier Start Received From 1003428268
06:59:20 corosync [SYNC ] Barrier completion status for nodeid 1003428268 =
1.
06:59:20 corosync [SYNC ] Synchronization barrier completed
06:59:20 corosync [SYNC ] Committing synchronization for (corosync cluster
closed process group service v1.01)
06:59:20 corosync [MAIN ] Completed service synchronization, ready to provide
service.
06:59:20 corosync [TOTEM ] waiting_trans_ack changed to 0
06:59:20node03lrmd: [14934]: info: G_main_add_SignalHandler: Added signal
handler for signal 15
06:59:20node03lrmd: [14934]: info: G_main_add_SignalHandler: Added signal
handler for signal 17
06:59:20node03lrmd: [14934]: info: enabling coredumps
06:59:20node03lrmd: [14934]: info: G_main_add_SignalHandler: Added signal
handler for signal 10
06:59:20node03lrmd: [14934]: info: G_main_add_SignalHandler: Added signal
handler for signal 12
06:59:20node03lrmd: [14934]: debug: main: run the loop...
06:59:20node03lrmd: [14934]: info: Started.
06:59:20 [14935]node03 attrd: info: crm_log_init_worker: Changed
active directory to /var/lib/heartbeat/cores/hacluster
06:59:20 [14935]node03 attrd: info: main: Starting up
06:59:20 [14935]node03 attrd: info: get_cluster_type: Cluster type
is: 'openais'
06:59:20 [14935]node03 attrd: notice: crm_cluster_connect:
Connecting to cluster infrastructure: classic openais (with plugin)
06:59:20 [14936]node03 pengine: info: crm_log_init_worker: Changed
active directory to /var/lib/heartbeat/cores/hacluster
06:59:20 [14935]node03 attrd: info: init_ais_connection_classic:
Creating connection to our Corosync plugin
06:59:20 [14936]node03 pengine: debug: main: Checking for old
instances of pengine
06:59:20 [14937]node03 crmd: info: crm_log_init_worker: Changed
active directory to /var/lib/heartbeat/cores/hacluster
06:59:20 [14936]node03 pengine: debug: init_client_ipc_comms_nodispatch:
Attempting to talk on: /var/run/crm/pengine
06:59:20 [14937]node03 crmd: notice: main: CRM Hg Version:
148fccfd5985c5590cc601123c6c16e966b85d14
06:59:20 [14936]node03 pengine: debug: init_client_ipc_comms_nodispatch:
Could not init comms on: /var/run/crm/pengine
06:59:20 [14936]node03 pengine: debug: main: Init server comms
06:59:20 [14936]node03 pengine: info: main: Starting pengine
06:59:20 [14937]node03 crmd: debug: crmd_init: Starting crmd
06:59:20 [14937]node03 crmd: debug: s_crmd_fsa: Processing
I_STARTUP: [ state=S_STARTING cause=C_STARTUP origin=crmd_init ]
06:59:20 [14937]node03 crmd: debug: do_fsa_action: actions:trace:
// A_LOG
06:59:20 [14937]node03 crmd: debug: do_log: FSA: Input I_STARTUP
from crmd_init() received in state S_STARTING
06:59:20 [14937]node03 crmd: debug: do_fsa_action: actions:trace:
// A_STARTUP
06:59:20 [14937]node03 crmd: debug: do_startup: Registering
Signal Handlers
06:59:20 [14937]node03 crmd: debug: do_startup: Creating CIB
and LRM objects
06:59:20 [14937]node03 crmd: debug: do_fsa_action: actions:trace:
// A_CIB_START
06:59:20 [14937]node03 crmd: debug: init_client_ipc_comms_nodispatch:
Attempting to talk on: /var/run/crm/cib_rw
06:59:20 [14937]node03 crmd: debug: init_client_ipc_comms_nodispatch:
Could not init comms on: /var/run/crm/cib_rw
06:59:20 [14937]node03 crmd: debug: cib_native_signon_raw:
Connection to command channel failed
06:59:20 [14937]node03 crmd: debug: init_client_ipc_comms_nodispatch:
Attempting to talk on: /var/run/crm/cib_callback
06:59:20 [14937]node03 crmd: debug: init_client_ipc_comms_nodispatch:
Could not init comms on: /var/run/crm/cib_callback
06:59:20 [14937]node03 crmd: debug: cib_native_signon_raw:
Connection to callback channel failed
06:59:20 [14937]node03 crmd: debug: cib_native_signon_raw:
Connection to CIB failed: connection failed
06:59:20 [14937]node03 crmd: debug: cib_native_signoff: Signing
out of the CIB Service
06:59:20 [14935]node03 attrd: debug: init_ais_connection_classic:
Adding fd=6 to mainloop
06:59:20 [14935]node03 attrd: info: init_ais_connection_classic:
AIS connection established
06:59:20 [14935]node03 attrd: info: get_ais_nodeid: Server details:
id=1003428268 uname=node03 cname=pcmk
06:59:20 [14935]node03 attrd: info: init_ais_connection_once:
Connection to 'classic openais (with plugin)': established
06:59:20 [14935]node03 attrd: debug: crm_new_peer: Creating entry
for node node03/1003428268
06:59:20 [14935]node03 attrd: info: crm_new_peer: Nodenode03now
has id: 1003428268
06:59:20 [14935]node03 attrd: info: crm_new_peer: Node 1003428268
is now known as node03
06:59:20 [14935]node03 attrd: info: main: Cluster connection
active
06:59:20 [14935]node03 attrd: info: main: Accepting attribute
updates
06:59:20 [14935]node03 attrd: notice: main: Starting mainloop...
06:59:20 [14933]node03stonith-ng: info: crm_log_init_worker: Changed
active directory to /var/lib/heartbeat/cores/root
06:59:20 [14933]node03stonith-ng: info: get_cluster_type: Cluster type
is: 'openais'
06:59:20 [14933]node03stonith-ng: notice: crm_cluster_connect:
Connecting to cluster infrastructure: classic openais (with plugin)
06:59:20 [14933]node03stonith-ng: info: init_ais_connection_classic:
Creating connection to our Corosync plugin
06:59:20 [14932]node03 cib: info: crm_log_init_worker: Changed
active directory to /var/lib/heartbeat/cores/hacluster
06:59:20 [14932]node03 cib: info: retrieveCib: Reading cluster
configuration from: /var/lib/heartbeat/crm/cib.xml (digest:
/var/lib/heartbeat/crm/cib.xml.sig)
06:59:20 [14932]node03 cib: debug: readCibXmlFile: [on-disk] <cib
epoch="251" num_updates="0" admin_epoch="1" validate-with="pacemaker-1.2"
crm_feature_set="3.0.6" update-origin="node03" update-client="crmd"
cib-last-written="Tue Apr 9 06:48:33 2013" have-quorum="1" >
06:59:20 [14932]node03 cib: debug: readCibXmlFile: [on-disk]
<configuration >
06:59:20 [14932]node03 cib: debug: readCibXmlFile: [on-disk]
<crm_config >
06:59:20 [14932]node03 cib: debug: readCibXmlFile: [on-disk]
<cluster_property_set id="cib-bootstrap-options" >
06:59:20 [14932]node03 cib: debug: readCibXmlFile: [on-disk]
<nvpair id="cib-bootstrap-options-default-resource-stickiness"
name="default-resource-stickiness" value="1000" />
06:59:20 [14932]node03 cib: debug: readCibXmlFile: [on-disk]
<nvpair id="cib-bootstrap-options-no-quorum-policy" name="no-quorum-policy"
value="ignore" />
06:59:20 [14932]node03 cib: debug: readCibXmlFile: [on-disk]
<nvpair id="cib-bootstrap-options-stonith-enabled" name="stonith-enabled"
value="false" />
06:59:20 [14932]node03 cib: debug: readCibXmlFile: [on-disk]
<nvpair id="cib-bootstrap-options-expected-quorum-votes"
name="expected-quorum-votes" value="3" />
06:59:20 [14932]node03 cib: debug: readCibXmlFile: [on-disk]
<nvpair id="cib-bootstrap-options-dc-version" name="dc-version"
value="1.1.7-6.el6-148fccfd5985c5590cc601123c6c16e966b85d14" />
06:59:20 [14932]node03 cib: debug: readCibXmlFile: [on-disk]
<nvpair id="cib-bootstrap-options-cluster-infrastructure"
name="cluster-infrastructure" value="openais" />
06:59:20 [14932]node03 cib: debug: readCibXmlFile: [on-disk]
<nvpair id="cib-bootstrap-options-last-lrm-refresh" name="last-lrm-refresh"
value="1365160119" />
06:59:20 [14932]node03 cib: debug: readCibXmlFile: [on-disk]
</cluster_property_set>
06:59:20 [14932]node03 cib: debug: readCibXmlFile: [on-disk]
</crm_config>
…
…
...
We are still seeing the extra pacemaker daemons when corosync starts up.
As an added check, all pacemaker daemons exited correctly when stoping corosync.
ldmd attempts to start twice..
ps aux | grep lrmd
root 16412 0.0 0.0 0 0 ? Z 07:20 0:00 [lrmd]
<defunct>
root 16419 0.0 0.0 34240 1052 ? S 07:20 0:00
/usr/lib64/heartbeat/lrmd
root 21030 0.0 0.0 103244 856 pts/0 S+ 08:37 0:00 grep lrmd
Help to resolve this issue appreciated..
Cheers,
Jimmy.
On 9 Apr 2013, at 00:16, Andrew Beekhof <[email protected]> wrote:
>
> On 08/04/2013, at 9:44 PM, Jimmy Magee <[email protected]> wrote:
>
>> Hi Andrew,
>>
>> thanks for your reply, we are running at debug level with the following
>> config from corosync.conf
>>
>> logging {
>> fileline: off
>> to_syslog: yes
>> to_stderr: no
>> syslog_facility: daemon
>> debug: on
>> timestamp: on
>> }
>>
>> Looking at the issue further, there seems to be 2 instances of some
>> pacemaker daemons running on this particular node….
>>
>>
>> ps aux | grep pace
>>
>> 495 3050 0.2 0.0 89956 7184 ? S 07:10 0:01
>> /usr/libexec/pacemaker/cib
>> root 3051 0.0 0.0 87128 3152 ? S 07:10 0:00
>> /usr/libexec/pacemaker/stonithd
>> 495 3053 0.0 0.0 91188 2840 ? S 07:10 0:00
>> /usr/libexec/pacemaker/attrd
>> 495 3054 0.0 0.0 87336 2484 ? S 07:10 0:00
>> /usr/libexec/pacemaker/pengine
>> 495 3055 0.0 0.0 91332 3156 ? S 07:10 0:00
>> /usr/libexec/pacemaker/crmd
>> 495 3057 0.0 0.0 88876 5224 ? S 07:10 0:00
>> /usr/libexec/pacemaker/cib
>> root 3058 0.0 0.0 87128 3132 ? S 07:10 0:00
>> /usr/libexec/pacemaker/stonithd
>> 495 3060 0.0 0.0 91188 2788 ? S 07:10 0:00
>> /usr/libexec/pacemaker/attrd
>> 495 3062 0.0 0.0 91436 3932 ? S 07:10 0:00
>> /usr/libexec/pacemaker/crmd
>>
>>
>> ps aux | grep corosync
>> root 3044 0.1 0.0 977852 9264 ? Ssl 07:10 0:01 corosync
>> root 9363 0.0 0.0 103248 856 pts/0 S+ 07:33 0:00 grep
>> corosync
>>
>>
>> ps aux | grep lrmd
>> root 3052 0.0 0.0 76464 2528 ? S 07:10 0:00
>> /usr/lib64/heartbeat/lrmd
>>
>>
>> Not sure why this is the case? Appreciate any help..
>>
>
> Have you perhaps specified "ver: 0" for the pacemaker plugin and run "service
> pacemaker start" ?
>
>> Cheers,
>> Jimmy.
>>
>>
>>
>>
>>
>> On 8 Apr 2013, at 03:00, Andrew Beekhof <[email protected]> wrote:
>>
>>> This doesn't look promising:
>>>
>>> lrmd: [4939]: info: G_main_add_SignalHandler: Added signal handler for
>>> signal 15
>>> lrmd: [4946]: info: Signal sent to pid=4939, waiting for process to exit
>>> lrmd: [4939]: info: G_main_add_SignalHandler: Added signal handler for
>>> signal 17
>>> lrmd: [4939]: info: enabling coredumps
>>> lrmd: [4939]: info: G_main_add_SignalHandler: Added signal handler for
>>> signal 10
>>> lrmd: [4939]: info: G_main_add_SignalHandler: Added signal handler for
>>> signal 12
>>> lrmd: [4939]: info: Started.
>>> lrmd: [4939]: info: lrmd is shutting down
>>>
>>> The lrmd comes up but then immediately shuts down.
>>> Perhaps try enabling debug to see if that sheds any light.
>>>
>>> On 06/04/2013, at 4:58 AM, Jimmy Magee <[email protected]> wrote:
>>>
>>>> Hi guys,
>>>>
>>>> Apologies for reposting this query, it inadvertently got added to an
>>>> existing topic!
>>>>
>>>>
>>>> We have a three node cluster deployed in a customer's network:
>>>> - 2 nodes are on the same switch
>>>> - 3rd node on the same subnet but there's a router in between.
>>>> - IP Multicast is enabled and has been tested using omping as follows..
>>>>
>>>> On each node ran..
>>>>
>>>> omping node01 node02 node3
>>>>
>>>>
>>>> ON node 3
>>>>
>>>> Node01 : unicast, xmt/rcv/%loss = 23/23/0%, min/avg/max/std-dev =
>>>> 0.128/0.181/0.255/0.025
>>>> Node01 : multicast, xmt/rcv/%loss = 23/23/0%, min/avg/max/std-dev =
>>>> 0.140/0.187/0.219/0.021
>>>> Node02 : unicast, xmt/rcv/%loss = 8/8/0%, min/avg/max/std-dev =
>>>> 0.115/0.150/0.168/0.021
>>>> Node02 : multicast, xmt/rcv/%loss = 8/8/0%, min/avg/max/std-dev =
>>>> 0.134/0.162/0.177/0.014
>>>>
>>>>
>>>> On node 2
>>>>
>>>>
>>>> Node01 : unicast, xmt/rcv/%loss = 9/9/0%, min/avg/max/std-dev =
>>>> 0.168/0.191/0.205/0.014
>>>> Node01 : multicast, xmt/rcv/%loss = 9/8/11% (seq>=2 0%),
>>>> min/avg/max/std-dev = 0.138/0.179/0.206/0.028
>>>> Node03 : unicast, xmt/rcv/%loss = 9/9/0%, min/avg/max/std-dev =
>>>> 0.112/0.149/0.175/0.022
>>>> Node03 : multicast, xmt/rcv/%loss = 9/8/11% (seq>=2 0%),
>>>> min/avg/max/std-dev = 0.124/0.167/0.178/0.018
>>>>
>>>>
>>>>
>>>> On node 1
>>>>
>>>> Node02 : unicast, xmt/rcv/%loss = 8/8/0%, min/avg/max/std-dev =
>>>> 0.154/0.185/0.208/0.019
>>>> Node02 : multicast, xmt/rcv/%loss = 8/8/0%, min/avg/max/std-dev =
>>>> 0.175/0.198/0.214/0.015
>>>> Node03 : unicast, xmt/rcv/%loss = 23/23/0%, min/avg/max/std-dev =
>>>> 0.114/0.160/0.185/0.019
>>>> Node03 : multicast, xmt/rcv/%loss = 23/22/4% (seq>=2 0%),
>>>> min/avg/max/std-dev = 0.124/0.172/0.197/0.019
>>>>
>>>>
>>>> - Problem is intermittent but frequent. Occasionally starts fine when
>>>> started from scratch.
>>>>
>>>> We suspect the problem is related to node 3 as we can see lrmd failures as
>>>> per the attached log. We've checked permissions are ok as per
>>>> https://bugs.launchpad.net/ubuntu/+source/cluster-glue/+bug/676391
>>>>
>>>>
>>>>
>>>> stonith-ng[1437]: error: ais_dispatch: AIS connection failed
>>>> stonith-ng[1437]: error: stonith_peer_ais_destroy: AIS connection
>>>> terminated
>>>> corosync[1430]: [SERV ] Service engine unloaded: Pacemaker Cluster
>>>> Manager 1.1.6
>>>> corosync[1430]: [SERV ] Service engine unloaded: corosync extended
>>>> virtual synchrony service
>>>> corosync[1430]: [SERV ] Service engine unloaded: corosync configuration
>>>> service
>>>> corosync[1430]: [SERV ] Service engine unloaded: corosync cluster
>>>> closed process group service v1.01
>>>> corosync[1430]: [SERV ] Service engine unloaded: corosync cluster
>>>> config database access v1.01
>>>> corosync[1430]: [SERV ] Service engine unloaded: corosync profile
>>>> loading service
>>>> corosync[1430]: [SERV ] Service engine unloaded: corosync cluster
>>>> quorum service v0.1
>>>> corosync[1430]: [MAIN ] Corosync Cluster Engine exiting with status 0
>>>> at main.c:1894.
>>>>
>>>> corosync[4931]: [MAIN ] Corosync built-in features: nss dbus rdma snmp
>>>> corosync[4931]: [MAIN ] Successfully read main configuration file
>>>> '/etc/corosync/corosync.conf'.
>>>> corosync[4931]: [TOTEM ] Initializing transport (UDP/IP Multicast).
>>>> corosync[4931]: [TOTEM ] Initializing transmit/receive security:
>>>> libtomcrypt SOBER128/SHA1HMAC (mode 0).
>>>> corosync[4931]: [TOTEM ] The network interface [10.87.79.59] is now up.
>>>> corosync[4931]: [pcmk ] Logging: Initialized pcmk_startup
>>>> corosync[4931]: [SERV ] Service engine loaded: Pacemaker Cluster
>>>> Manager 1.1.6
>>>> corosync[4931]: [pcmk ] Logging: Initialized pcmk_startup
>>>> corosync[4931]: [SERV ] Service engine loaded: Pacemaker Cluster
>>>> Manager 1.1.6
>>>> corosync[4931]: [SERV ] Service engine loaded: corosync extended
>>>> virtual synchrony service
>>>> corosync[4931]: [SERV ] Service engine loaded: corosync configuration
>>>> service
>>>> orosync[4931]: [SERV ] Service engine loaded: corosync cluster closed
>>>> process group service v1.01
>>>> corosync[4931]: [SERV ] Service engine loaded: corosync cluster config
>>>> database access v1.01
>>>> corosync[4931]: [SERV ] Service engine loaded: corosync profile loading
>>>> service
>>>> corosync[4931]: [SERV ] Service engine loaded: corosync cluster quorum
>>>> service v0.1
>>>> corosync[4931]: [MAIN ] Compatibility mode set to whitetank. Using V1
>>>> and V2 of the synchronization engine.
>>>> corosync[4931]: [TOTEM ] A processor joined or left the membership and a
>>>> new membership was formed.
>>>> corosync[4931]: [CPG ] chosen downlist: sender r(0) ip(10.87.79.59) ;
>>>> members(old:0 left:0)
>>>> corosync[4931]: [MAIN ] Completed service synchronization, ready to
>>>> provide service.
>>>> cib[4937]: info: crm_log_init_worker: Changed active directory to
>>>> /var/lib/heartbeat/cores/hacluster
>>>> cib[4937]: info: retrieveCib: Reading cluster configuration from:
>>>> /var/lib/heartbeat/crm/cib.xml (digest: /var/lib/heartbeat/crm/cib.xml.sig)
>>>> cib[4937]: info: validate_with_relaxng: Creating RNG parser context
>>>> stonith-ng[4945]: info: crm_log_init_worker: Changed active directory
>>>> to /var/lib/heartbeat/cores/root
>>>> stonith-ng[4945]: info: get_cluster_type: Cluster type is: 'openais'
>>>> stonith-ng[4945]: notice: crm_cluster_connect: Connecting to cluster
>>>> infrastructure: classic openais (with plugin)
>>>> stonith-ng[4945]: info: init_ais_connection_classic: Creating
>>>> connection to our Corosync plugin
>>>> cib[4944]: info: crm_log_init_worker: Changed active directory to
>>>> /var/lib/heartbeat/cores/hacluster
>>>> cib[4944]: info: retrieveCib: Reading cluster configuration from:
>>>> /var/lib/heartbeat/crm/cib.xml (digest: /var/lib/heartbeat/crm/cib.xml.sig)
>>>> stonith-ng[4945]: info: init_ais_connection_classic: AIS connection
>>>> established
>>>> stonith-ng[4945]: info: get_ais_nodeid: Server details: id=1003428268
>>>> uname=w0110Danmtapp03 cname=pcmk
>>>> stonith-ng[4945]: info: init_ais_connection_once: Connection to
>>>> 'classic openais (with plugin)': established
>>>> stonith-ng[4945]: info: crm_new_peer: Node node03 now has id:
>>>> 1003428268
>>>> stonith-ng[4945]: info: crm_new_peer: Node 1003428268 is now known as
>>>> node03
>>>> cib[4944]: info: validate_with_relaxng: Creating RNG parser context
>>>> lrmd: [4939]: info: G_main_add_SignalHandler: Added signal handler for
>>>> signal 15
>>>> lrmd: [4946]: info: Signal sent to pid=4939, waiting for process to exit
>>>> lrmd: [4939]: info: G_main_add_SignalHandler: Added signal handler for
>>>> signal 17
>>>> lrmd: [4939]: info: enabling coredumps
>>>> stonith-ng[4938]: info: crm_log_init_worker: Changed active directory
>>>> to /var/lib/heartbeat/cores/root
>>>> lrmd: [4939]: info: G_main_add_SignalHandler: Added signal handler for
>>>> signal 10
>>>> lrmd: [4939]: info: G_main_add_SignalHandler: Added signal handler for
>>>> signal 12
>>>> lrmd: [4939]: info: Started.
>>>> stonith-ng[4938]: info: get_cluster_type: Cluster type is: 'openais'
>>>> lrmd: [4939]: info: lrmd is shutting down
>>>> stonith-ng[4938]: notice: crm_cluster_connect: Connecting to cluster
>>>> infrastructure: classic openais (with plugin)
>>>> stonith-ng[4938]: info: init_ais_connection_classic: Creating
>>>> connection to our Corosync plugin
>>>> attrd[4940]: info: crm_log_init_worker: Changed active directory to
>>>> /var/lib/heartbeat/cores/hacluster
>>>> pengine[4941]: info: crm_log_init_worker: Changed active directory to
>>>> /var/lib/heartbeat/cores/hacluster
>>>> attrd[4940]: info: main: Starting up
>>>> attrd[4940]: info: get_cluster_type: Cluster type is: 'openais'
>>>> attrd[4940]: notice: crm_cluster_connect: Connecting to cluster
>>>> infrastructure: classic openais (with plugin)
>>>> attrd[4940]: info: init_ais_connection_classic: Creating connection to
>>>> our Corosync plugin
>>>> crmd[4942]: info: crm_log_init_worker: Changed active directory to
>>>> /var/lib/heartbeat/cores/hacluster
>>>> pengine[4941]: info: main: Starting pengine
>>>> crmd[4942]: notice: main: CRM Hg Version:
>>>> 148fccfd5985c5590cc601123c6c16e966b85d14
>>>> pengine[4948]: info: crm_log_init_worker: Changed active directory to
>>>> /var/lib/heartbeat/cores/hacluster
>>>> pengine[4948]: warning: main: Terminating previous PE instance
>>>> attrd[4947]: info: crm_log_init_worker: Changed active directory to
>>>> /var/lib/heartbeat/cores/hacluster
>>>> pengine[4941]: warning: process_pe_message: Received quit message,
>>>> terminating
>>>> attrd[4947]: info: main: Starting up
>>>> attrd[4947]: info: get_cluster_type: Cluster type is: 'openais'
>>>> attrd[4947]: notice: crm_cluster_connect: Connecting to cluster
>>>> infrastructure: classic openais (with plugin)
>>>> attrd[4947]: info: init_ais_connection_classic: Creating connection to
>>>> our Corosync plugin
>>>> crmd[4949]: info: crm_log_init_worker: Changed active directory to
>>>> /var/lib/heartbeat/cores/hacluster
>>>> crmd[4949]: notice: main: CRM Hg Version:
>>>> 148fccfd5985c5590cc601123c6c16e966b85d14
>>>> stonith-ng[4938]: info: init_ais_connection_classic: AIS connection
>>>> established
>>>> stonith-ng[4938]: info: get_ais_nodeid: Server details: id=1003428268
>>>> uname=node03 cname=pcmk
>>>> stonith-ng[4938]: info: init_ais_connection_once: Connection to
>>>> 'classic openais (with plugin)': established
>>>> stonith-ng[4938]: info: crm_new_peer: Node node03 now has id:
>>>> 1003428268
>>>> stonith-ng[4938]: info: crm_new_peer: Node 1003428268 is now known as
>>>> node03
>>>> attrd[4940]: info: init_ais_connection_classic: AIS connection
>>>> established
>>>> attrd[4940]: info: get_ais_nodeid: Server details: id=1003428268
>>>> uname=node03 cname=pcmk
>>>> attrd[4940]: info: init_ais_connection_once: Connection to 'classic
>>>> openais (with plugin)': established
>>>> attrd[4940]: info: crm_new_peer: Node node03 now has id: 1003428268
>>>> attrd[4940]: info: crm_new_peer: Node 1003428268 is now known as node03
>>>> attrd[4940]: info: main: Cluster connection active
>>>> attrd[4940]: info: main: Accepting attribute updates
>>>> attrd[4940]: notice: main: Starting mainloop...
>>>> attrd[4947]: info: init_ais_connection_classic: AIS connection
>>>> established
>>>> attrd[4947]: info: get_ais_nodeid: Server details: id=1003428268
>>>> uname=node03 cname=pcmk
>>>> attrd[4947]: info: init_ais_connection_once: Connection to 'classic
>>>> openais (with plugin)': established
>>>> attrd[4947]: info: crm_new_peer: Node node03 now has id: 1003428268
>>>> attrd[4947]: info: crm_new_peer: Node 1003428268 is now known as node03
>>>> attrd[4947]: info: main: Cluster connection active
>>>> attrd[4947]: info: main: Accepting attribute updates
>>>> attrd[4947]: notice: main: Starting mainloop...
>>>> cib[4937]: info: startCib: CIB Initialization completed successfully
>>>> cib[4937]: info: get_cluster_type: Cluster type is: 'openais'
>>>> cib[4937]: notice: crm_cluster_connect: Connecting to cluster
>>>> infrastructure: classic openais (with plugin)
>>>> cib[4937]: info: init_ais_connection_classic: Creating connection to
>>>> our Corosync plugin
>>>> cib[4944]: info: startCib: CIB Initialization completed successfully
>>>> cib[4944]: info: get_cluster_type: Cluster type is: 'openais'
>>>> cib[4944]: notice: crm_cluster_connect: Connecting to cluster
>>>> infrastructure: classic openais (with plugin)
>>>> cib[4944]: info: init_ais_connection_classic: Creating connection to
>>>> our Corosync plugin
>>>> cib[4937]: info: init_ais_connection_classic: AIS connection
>>>> established
>>>> cib[4937]: info: get_ais_nodeid: Server details: id=1003428268
>>>> uname=node03 cname=pcmk
>>>> cib[4937]: info: init_ais_connection_once: Connection to 'classic
>>>> openais (with plugin)': established
>>>> cib[4937]: info: crm_new_peer: Node node03 now has id: 1003428268
>>>> cib[4937]: info: crm_new_peer: Node 1003428268 is now known as node03
>>>> cib[4937]: info: cib_init: Starting cib mainloop
>>>> cib[4937]: info: ais_dispatch_message: Membership 6892: quorum still
>>>> lost
>>>> cib[4937]: info: crm_update_peer: Node node03: id=1003428268
>>>> state=member (new) addr=r(0) ip(10.87.79.59) (new) votes=1 (new) born=0
>>>> seen=6892 proc=00000000000000000000000000111312 (new)
>>>> cib[4944]: info: init_ais_connection_classic: AIS connection
>>>> established
>>>> cib[4944]: info: get_ais_nodeid: Server details: id=1003428268
>>>> uname=node03 cname=pcmk
>>>> cib[4944]: info: init_ais_connection_once: Connection to 'classic
>>>> openais (with plugin)': established
>>>> cib[4944]: info: crm_new_peer: Node node03 now has id: 1003428268
>>>> cib[4944]: info: crm_new_peer: Node 1003428268 is now known as node03
>>>> cib[4944]: info: cib_init: Starting cib mainloop
>>>> stonith-ng[4945]: notice: setup_cib: Watching for stonith topology
>>>> changes
>>>> stonith-ng[4945]: info: main: Starting stonith-ng mainloop
>>>> cib[4937]: info: ais_dispatch_message: Membership 6896: quorum still
>>>> lost
>>>> corosync[4931]: [TOTEM ] A processor joined or left the membership and a
>>>> new membership was formed.
>>>> cib[4937]: info: crm_new_peer: Node <null> now has id: 969873836
>>>> cib[4937]: info: crm_update_peer: Node (null): id=969873836
>>>> state=member (new) addr=r(0) ip(172.25.207.57) votes=0 born=0 seen=6896
>>>> proc=00000000000000000000000000000000
>>>> cib[4937]: info: crm_new_peer: Node <null> now has id: 986651052
>>>> cib[4937]: info: crm_update_peer: Node (null): id=986651052
>>>> state=member (new) addr=r(0) ip(172.25.207.58) votes=0 born=0 seen=6896
>>>> proc=00000000000000000000000000000000
>>>> cib[4937]: notice: ais_dispatch_message: Membership 6896: quorum acquired
>>>> cib[4937]: info: crm_get_peer: Node 986651052 is now known as node02
>>>> cib[4937]: info: crm_update_peer: Node node02: id=986651052
>>>> state=member addr=r(0) ip(172.25.207.58) votes=1 (new) born=6812
>>>> seen=6896 proc=00000000000000000000000000111312 (new)
>>>> cib[4937]: info: ais_dispatch_message: Membership 6896: quorum retained
>>>> cib[4937]: info: crm_get_peer: Node 969873836 is now known as node01
>>>> cib[4937]: info: crm_update_peer: Node node01: id=969873836
>>>> state=member addr=r(0) ip(172.25.207.57) votes=1 (new) born=6848
>>>> seen=6896 proc=00000000000000000000000000111312 (new)
>>>> rsyslogd-2177: imuxsock begins to drop messages from pid 4931 due to
>>>> rate-limiting
>>>> crmd[4942]: info: do_cib_control: CIB connection established
>>>> crmd[4942]: info: get_cluster_type: Cluster type is: 'openais'
>>>> crmd[4942]: notice: crm_cluster_connect: Connecting to cluster
>>>> infrastructure: classic openais (with plugin)
>>>> crmd[4942]: info: init_ais_connection_classic: Creating connection to
>>>> our Corosync plugin
>>>> cib[4937]: info: cib_process_diff: Diff 1.249.28 -> 1.249.29 not
>>>> applied to 1.249.0: current "num_updates" is less than required
>>>> cib[4937]: info: cib_server_process_diff: Requesting re-sync from peer
>>>> crmd[4949]: info: do_cib_control: CIB connection established
>>>> crmd[4949]: info: get_cluster_type: Cluster type is: 'openais'
>>>> crmd[4949]: notice: crm_cluster_connect: Connecting to cluster
>>>> infrastructure: classic openais (with plugin)
>>>> crmd[4949]: info: init_ais_connection_classic: Creating connection to
>>>> our Corosync plugin
>>>> stonith-ng[4938]: notice: setup_cib: Watching for stonith topology
>>>> changes
>>>> stonith-ng[4938]: info: main: Starting stonith-ng mainloop
>>>> cib[4937]: notice: cib_server_process_diff: Not applying diff 1.249.29
>>>> -> 1.249.30 (sync in progress)
>>>> crmd[4942]: info: init_ais_connection_classic: AIS connection
>>>> established
>>>> crmd[4942]: info: get_ais_nodeid: Server details: id=1003428268
>>>> uname=node03 cname=pcmk
>>>> crmd[4942]: info: init_ais_connection_once: Connection to 'classic
>>>> openais (with plugin)': established
>>>> crmd[4942]: info: crm_new_peer: Node node03 now has id: 1003428268
>>>> crmd[4942]: info: crm_new_peer: Node 1003428268 is now known as node03
>>>> crmd[4942]: info: ais_status_callback: status: node03 is now unknown
>>>> crmd[4942]: info: do_ha_control: Connected to the cluster
>>>> crmd[4942]: warning: do_lrm_control: Failed to sign on to the LRM 1 (30
>>>> max) times
>>>> crmd[4949]: info: init_ais_connection_classic: AIS connection
>>>> established
>>>> crmd[4949]: info: get_ais_nodeid: Server details: id=1003428268
>>>> uname=node03 cname=pcmk
>>>> crmd[4949]: info: init_ais_connection_once: Connection to 'classic
>>>> openais (with plugin)': established
>>>> crmd[4942]: notice: ais_dispatch_message: Membership 6896: quorum
>>>> acquired
>>>> crmd[4949]: info: crm_new_peer: Node node03 now has id: 1003428268
>>>> crmd[4949]: info: crm_new_peer: Node 1003428268 is now known as node03
>>>> crmd[4942]: info: crm_new_peer: Node node01 now has id: 969873836
>>>> crmd[4949]: info: ais_status_callback: status: node03 is now unknown
>>>> crmd[4942]: info: crm_new_peer: Node 969873836 is now known as node01
>>>> crmd[4949]: info: do_ha_control: Connected to the cluster
>>>> crmd[4942]: info: ais_status_callback: status: node01 is now unknown
>>>> crmd[4942]: info: ais_status_callback: status: node01 is now member
>>>> (was unknown)
>>>> crmd[4942]: info: crm_update_peer: Node node01: id=969873836
>>>> state=member (new) addr=r(0) ip(172.25.207.57) votes=1 born=6848
>>>> seen=6896 proc=00000000000000000000000000111312
>>>> crmd[4942]: info: crm_new_peer: Node node02 now has id: 986651052
>>>> crmd[4942]: info: crm_new_peer: Node 986651052 is now known as node02
>>>> crmd[4942]: info: ais_status_callback: status: node02 is now unknown
>>>> crmd[4949]: warning: do_lrm_control: Failed to sign on to the LRM 1 (30
>>>> max) times
>>>> crmd[4942]: info: ais_status_callback: status: node02 is now member
>>>> (was unknown)
>>>> crmd[4942]: info: crm_update_peer: Node node02: id=986651052
>>>> state=member (new) addr=r(0) ip(172.25.207.58) votes=1 born=6812
>>>> seen=6896 proc=00000000000000000000000000111312
>>>> crmd[4942]: notice: crmd_peer_update: Status update: Client node03/crmd
>>>> now has status [online] (DC=<null>)
>>>> crmd[4942]: info: ais_status_callback: status: node03 is now member
>>>> (was unknown)
>>>> crmd[4942]: info: crm_update_peer: Node node03: id=1003428268
>>>> state=member (new) addr=r(0) ip(10.87.79.59) (new) votes=1 (new)
>>>> born=6896 seen=6896 proc=00000000000000000000000000111312 (new)
>>>> crmd[4942]: info: ais_dispatch_message: Membership 6896: quorum
>>>> retained
>>>> cib[4937]: notice: cib_server_process_diff: Not applying diff 1.249.30
>>>> -> 1.249.31 (sync in progress)
>>>> crmd[4942]: warning: do_lrm_control: Failed to sign on to the LRM 2 (30
>>>> max) times
>>>> crmd[4942]: warning: do_lrm_control: Failed to sign on to the LRM 3 (30
>>>> max) times
>>>> crmd[4949]: warning: do_lrm_control: Failed to sign on to the LRM 2 (30
>>>> max) times
>>>> crmd[4949]: notice: ais_dispatch_message: Membership 6896: quorum
>>>> acquired
>>>> rsyslogd-2177: imuxsock begins to drop messages from pid 4937 due to
>>>> rate-limiting
>>>> crmd[4942]: warning: do_lrm_control: Failed to sign on to the LRM 4 (30
>>>> max) times
>>>> crmd[4942]: warning: do_lrm_control: Failed to sign on to the LRM 5 (30
>>>> max) times
>>>> pengine[4948]: info: main: Starting pengine
>>>> crmd[4942]: info: crm_timer_popped: Wait Timer (I_NULL) just popped
>>>> (2000ms)
>>>> warning: do_lrm_control: Failed to sign on to the LRM 6 (30 max) times
>>>> crmd[4949]: info: crm_timer_popped: Wait Timer (I_NULL) just popped
>>>> (2000ms)
>>>> crmd[4949]: warning: do_lrm_control: Failed to sign on to the LRM 3 (30
>>>> max) times
>>>> attrd[4940]: info: cib_connect: Connected to the CIB after 1 signon
>>>> attempts
>>>> attrd[4940]: info: cib_connect: Sending full refresh
>>>> crmd[4942]: info: crm_timer_popped: Wait Timer (I_NULL) just popped
>>>> (2000ms)
>>>> crmd[4942]: warning: do_lrm_control: Failed to sign on to the LRM 7 (30
>>>> max) times
>>>> attrd[4947]: info: cib_connect: Connected to the CIB after 1 signon
>>>> attempts
>>>> attrd[4947]: info: cib_connect: Sending full refresh
>>>> crmd[4949]: info: crm_timer_popped: Wait Timer (I_NULL) just popped
>>>> (2000ms)
>>>> crmd[4949]: warning: do_lrm_control: Failed to sign on to the LRM 4 (30
>>>> max) times
>>>> crmd[4942]: info: crm_timer_popped: Wait Timer (I_NULL) just popped
>>>> (2000ms)
>>>> crmd[4942]: warning: do_lrm_control: Failed to sign on to the LRM 8 (30
>>>> max) times
>>>> crmd[4949]: info: crm_timer_popped: Wait Timer (I_NULL) just popped
>>>> (2000ms)
>>>> crmd[4949]: warning: do_lrm_control: Failed to sign on to the LRM 5 (30
>>>> max) times
>>>> crmd[4942]: info: crm_timer_popped: Wait Timer (I_NULL) just popped
>>>> (2000ms)
>>>> crmd[4942]: warning: do_lrm_control: Failed to sign on to the LRM 9 (30
>>>> max) times
>>>> crmd[4949]: info: crm_timer_popped: Wait Timer (I_NULL) just popped
>>>> (2000ms)
>>>> crmd[4949]: warning: do_lrm_control: Failed to sign on to the LRM 6 (30
>>>> max) times
>>>> crmd[4942]: info: crm_timer_popped: Wait Timer (I_NULL) just popped
>>>> (2000ms)
>>>> crmd[4942]: warning: do_lrm_control: Failed to sign on to the LRM 10 (30
>>>> max) times
>>>> crmd[4949]: info: crm_timer_popped: Wait Timer (I_NULL) just popped
>>>> (2000ms)
>>>> crmd[4949]: warning: do_lrm_control: Failed to sign on to the LRM 7 (30
>>>> max) times
>>>> crmd[4942]: info: crm_timer_popped: Wait Timer (I_NULL) just popped
>>>> (2000ms)
>>>> crmd[4949]: info: crm_timer_popped: Wait Timer (I_NULL) just popped
>>>> (2000ms)
>>>> crmd[4942]: warning: do_lrm_control: Failed to sign on to the LRM 11 (30
>>>> max) times
>>>> crmd[4949]: warning: do_lrm_control: Failed to sign on to the LRM 8 (30
>>>> max) times
>>>> crmd[4942]: info: crm_timer_popped: Wait Timer (I_NULL) just popped
>>>> (2000ms)
>>>> crmd[4949]: info: crm_timer_popped: Wait Timer (I_NULL) just popped
>>>> (2000ms)
>>>> crmd[4942]: warning: do_lrm_control: Failed to sign on to the LRM 12 (30
>>>> max) times
>>>> crmd[4949]: warning: do_lrm_control: Failed to sign on to the LRM 9 (30
>>>> max) times
>>>> crmd[4949]: info: crm_timer_popped: Wait Timer (I_NULL) just popped
>>>> (2000ms)
>>>> crmd[4942]: info: crm_timer_popped: Wait Timer (I_NULL) just popped
>>>> (2000ms)
>>>> crmd[4942]: warning: do_lrm_control: Failed to sign on to the LRM 13 (30
>>>> max) times
>>>> crmd[4949]: warning: do_lrm_control: Failed to sign on to the LRM 10 (30
>>>> max) times
>>>> crmd[4949]: info: crm_timer_popped: Wait Timer (I_NULL) just popped
>>>> (2000ms)
>>>> crmd[4942]: info: crm_timer_popped: Wait Timer (I_NULL) just popped
>>>> (2000ms)
>>>> crmd[4942]: warning: do_lrm_control: Failed to sign on to the LRM 14 (30
>>>> max) times
>>>> crmd[4949]: warning: do_lrm_control: Failed to sign on to the LRM 11 (30
>>>> max) times
>>>> crmd[4949]: info: crm_timer_popped: Wait Timer (I_NULL) just popped
>>>> (2000ms)
>>>> crmd[4949]: warning: do_lrm_control: Failed to sign on to the LRM 12 (30
>>>> max) times
>>>> crmd[4942]: info: crm_timer_popped: Wait Timer (I_NULL) just popped
>>>> (2000ms)
>>>> crmd[4942]: warning: do_lrm_control: Failed to sign on to the LRM 15 (30
>>>> max) times
>>>> crmd[4949]: info: crm_timer_popped: Wait Timer (I_NULL) just popped
>>>> (2000ms)
>>>> crmd[4949]: warning: do_lrm_control: Failed to sign on to the LRM 13 (30
>>>> max) times
>>>> crmd[4942]: info: crm_timer_popped: Wait Timer (I_NULL) just popped
>>>> (2000ms)
>>>> crmd[4942]: warning: do_lrm_control: Failed to sign on to the LRM 16 (30
>>>> max) times
>>>> crmd[4949]: info: crm_timer_popped: Wait Timer (I_NULL) just popped
>>>> (2000ms)
>>>> crmd[4949]: warning: do_lrm_control: Failed to sign on to the LRM 14 (30
>>>> max) times
>>>> crmd[4942]: info: crm_timer_popped: Wait Timer (I_NULL) just popped
>>>> (2000ms)
>>>> crmd[4942]: warning: do_lrm_control: Failed to sign on to the LRM 17 (30
>>>> max) times
>>>> crmd[4949]: info: crm_timer_popped: Wait Timer (I_NULL) just popped
>>>> (2000ms)
>>>> crmd[4949]: warning: do_lrm_control: Failed to sign on to the LRM 15 (30
>>>> max) times
>>>> crmd[4942]: info: crm_timer_popped: Wait Timer (I_NULL) just popped
>>>> (2000ms)
>>>> crmd[4942]: warning: do_lrm_control: Failed to sign on to the LRM 18 (30
>>>> max) times
>>>>
>>>>
>>>> We have the following components installed..
>>>>
>>>>
>>>> corosynclib-1.4.1-15.el6.x86_64
>>>> corosync-1.4.1-15.el6.x86_64
>>>> cluster-glue-libs-1.0.5-6.el6.x86_64
>>>> clusterlib-3.0.12.1-49.el6.x86_64
>>>> pacemaker-cluster-libs-1.1.7-6.el6.x86_64
>>>> cluster-glue-1.0.5-6.el6.x86_64
>>>> resource-agents-3.9.2-12.el6.x86_64
>>>>
>>>>
>>>>
>>>> We'd appreciate assistance on how to debug what the issue may be and some
>>>> possible causes.
>>>>
>>>> Cheers,
>>>> Jimmy
>>>> _______________________________________________
>>>> Linux-HA mailing list
>>>> [email protected]
>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>>> See also: http://linux-ha.org/ReportingProblems
>>>
>>> _______________________________________________
>>> Linux-HA mailing list
>>> [email protected]
>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>> See also: http://linux-ha.org/ReportingProblems
>>
>> _______________________________________________
>> Linux-HA mailing list
>> [email protected]
>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>> See also: http://linux-ha.org/ReportingProblems
>
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems