Hi all,

I am trying to setup a basic pacemaker 1.1.10 on RHEL 6.5 with DRBD 8.3.16.

I've setup DRBD and configured one clustered LVM volume group using that drbd resource as the PV. With DRBD configured alone, I can stop/start pacemaker repeatedly without issue. However, when I add the LVM VG using ocf:heartbeat:LVM and setup a constraint, subsequent restarts of pacemaker almost always end up with a fence. I have to think then that I am messing up my constraints...

Config:

====
Cluster Name: an-anvil-04
Corosync Nodes:

Pacemaker Nodes:
 an-a04n01.alteeve.ca an-a04n02.alteeve.ca

Resources:
 Master: drbd_r0_Clone
Meta Attrs: master-max=2 master-node-max=1 clone-max=2 clone-node-max=1 notify=true
  Resource: drbd_r0 (class=ocf provider=linbit type=drbd)
   Attributes: drbd_resource=r0
   Operations: monitor interval=30s (drbd_r0-monitor-interval-30s)
 Master: lvm_n01_vg0_Clone
Meta Attrs: master-max=2 master-node-max=1 clone-max=2 clone-node-max=1 notify=true
  Resource: lvm_n01_vg0 (class=ocf provider=heartbeat type=LVM)
   Attributes: volgrpname=an-a04n01_vg0
   Operations: monitor interval=30s (lvm_n01_vg0-monitor-interval-30s)

Stonith Devices:
 Resource: fence_n01_ipmi (class=stonith type=fence_ipmilan)
Attributes: pcmk_host_list=an-a04n01.alteeve.ca ipaddr=an-a04n01.ipmi action=reboot login=admin passwd=Initial1 delay=15
  Operations: monitor interval=60s (fence_n01_ipmi-monitor-interval-60s)
 Resource: fence_n02_ipmi (class=stonith type=fence_ipmilan)
Attributes: pcmk_host_list=an-a04n02.alteeve.ca ipaddr=an-a04n02.ipmi action=reboot login=admin passwd=Initial1
  Operations: monitor interval=60s (fence_n02_ipmi-monitor-interval-60s)
Fencing Levels:

Location Constraints:
Ordering Constraints:
promote drbd_r0_Clone then start lvm_n01_vg0_Clone (Mandatory) (id:order-drbd_r0_Clone-lvm_n01_vg0_Clone-mandatory)
Colocation Constraints:

Cluster Properties:
 cluster-infrastructure: cman
 dc-version: 1.1.10-14.el6_5.3-368c726
 last-lrm-refresh: 1403062921
 no-quorum-policy: ignore
 stonith-enabled: true
====

Constraint:

====
Location Constraints:
Ordering Constraints:
promote drbd_r0_Clone then start lvm_n01_vg0_Clone (Mandatory) (id:order-drbd_r0_Clone-lvm_n01_vg0_Clone-mandatory)
Colocation Constraints:
====

Logs from 'an-a04n01', starting with '/etc/init.d/pacemaker start' (always survives and fences 'an-a04n02'):

====
Jun 17 23:55:32 an-a04n01 corosync[28088]: [MAIN ] Corosync Cluster Engine ('1.4.1'): started and ready to provide service. Jun 17 23:55:32 an-a04n01 corosync[28088]: [MAIN ] Corosync built-in features: nss dbus rdma snmp Jun 17 23:55:32 an-a04n01 corosync[28088]: [MAIN ] Successfully read config from /etc/cluster/cluster.conf Jun 17 23:55:32 an-a04n01 corosync[28088]: [MAIN ] Successfully parsed cman config Jun 17 23:55:32 an-a04n01 corosync[28088]: [TOTEM ] Initializing transport (UDP/IP Multicast). Jun 17 23:55:32 an-a04n01 corosync[28088]: [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0). Jun 17 23:55:32 an-a04n01 corosync[28088]: [TOTEM ] The network interface [10.20.40.1] is now up. Jun 17 23:55:32 an-a04n01 corosync[28088]: [QUORUM] Using quorum provider quorum_cman Jun 17 23:55:32 an-a04n01 corosync[28088]: [SERV ] Service engine loaded: corosync cluster quorum service v0.1 Jun 17 23:55:32 an-a04n01 corosync[28088]: [CMAN ] CMAN 3.0.12.1 (built Apr 3 2014 05:12:26) started Jun 17 23:55:32 an-a04n01 corosync[28088]: [SERV ] Service engine loaded: corosync CMAN membership service 2.90 Jun 17 23:55:32 an-a04n01 corosync[28088]: [SERV ] Service engine loaded: openais checkpoint service B.01.01 Jun 17 23:55:32 an-a04n01 corosync[28088]: [SERV ] Service engine loaded: corosync extended virtual synchrony service Jun 17 23:55:32 an-a04n01 corosync[28088]: [SERV ] Service engine loaded: corosync configuration service Jun 17 23:55:32 an-a04n01 corosync[28088]: [SERV ] Service engine loaded: corosync cluster closed process group service v1.01 Jun 17 23:55:32 an-a04n01 corosync[28088]: [SERV ] Service engine loaded: corosync cluster config database access v1.01 Jun 17 23:55:32 an-a04n01 corosync[28088]: [SERV ] Service engine loaded: corosync profile loading service Jun 17 23:55:32 an-a04n01 corosync[28088]: [QUORUM] Using quorum provider quorum_cman Jun 17 23:55:32 an-a04n01 corosync[28088]: [SERV ] Service engine loaded: corosync cluster quorum service v0.1 Jun 17 23:55:32 an-a04n01 corosync[28088]: [MAIN ] Compatibility mode set to whitetank. Using V1 and V2 of the synchronization engine. Jun 17 23:55:32 an-a04n01 corosync[28088]: [TOTEM ] A processor joined or left the membership and a new membership was formed. Jun 17 23:55:32 an-a04n01 corosync[28088]: [CMAN ] quorum regained, resuming activity Jun 17 23:55:32 an-a04n01 corosync[28088]: [QUORUM] This node is within the primary component and will provide service.
Jun 17 23:55:32 an-a04n01 corosync[28088]:   [QUORUM] Members[1]: 1
Jun 17 23:55:32 an-a04n01 corosync[28088]:   [QUORUM] Members[1]: 1
Jun 17 23:55:32 an-a04n01 corosync[28088]: [CPG ] chosen downlist: sender r(0) ip(10.20.40.1) ; members(old:0 left:0) Jun 17 23:55:32 an-a04n01 corosync[28088]: [MAIN ] Completed service synchronization, ready to provide service. Jun 17 23:55:33 an-a04n01 corosync[28088]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
Jun 17 23:55:33 an-a04n01 corosync[28088]:   [QUORUM] Members[2]: 1 2
Jun 17 23:55:33 an-a04n01 corosync[28088]:   [QUORUM] Members[2]: 1 2
Jun 17 23:55:33 an-a04n01 corosync[28088]: [CPG ] chosen downlist: sender r(0) ip(10.20.40.1) ; members(old:1 left:0) Jun 17 23:55:33 an-a04n01 corosync[28088]: [MAIN ] Completed service synchronization, ready to provide service.
Jun 17 23:55:36 an-a04n01 fenced[28143]: fenced 3.0.12.1 started
Jun 17 23:55:36 an-a04n01 dlm_controld[28169]: dlm_controld 3.0.12.1 started
Jun 17 23:55:37 an-a04n01 gfs_controld[28218]: gfs_controld 3.0.12.1 started
Jun 17 23:55:38 an-a04n01 pacemaker: Attempting to start clvmd
Jun 17 23:55:39 an-a04n01 kernel: dlm: Using TCP for communications
Jun 17 23:55:40 an-a04n01 kernel: dlm: connecting to 2
Jun 17 23:55:40 an-a04n01 clvmd: Cluster LVM daemon started - connected to CMAN
Jun 17 23:55:41 an-a04n01 pacemaker: Starting Pacemaker Cluster Manager
Jun 17 23:55:42 an-a04n01 pacemakerd[28349]: notice: crm_add_logfile: Additional logging available in /var/log/cluster/corosync.log Jun 17 23:55:42 an-a04n01 pacemakerd[28349]: notice: main: Starting Pacemaker 1.1.10-14.el6_5.3 (Build: 368c726): generated-manpages agent-manpages ascii-docs publican-docs ncurses libqb-logging libqb-ipc nagios corosync-plugin cman Jun 17 23:55:42 an-a04n01 cib[28355]: notice: crm_add_logfile: Additional logging available in /var/log/cluster/corosync.log Jun 17 23:55:42 an-a04n01 lrmd[28357]: notice: crm_add_logfile: Additional logging available in /var/log/cluster/corosync.log Jun 17 23:55:42 an-a04n01 attrd[28358]: notice: crm_add_logfile: Additional logging available in /var/log/cluster/corosync.log Jun 17 23:55:42 an-a04n01 pengine[28359]: notice: crm_add_logfile: Additional logging available in /var/log/cluster/corosync.log Jun 17 23:55:42 an-a04n01 attrd[28358]: notice: crm_cluster_connect: Connecting to cluster infrastructure: cman Jun 17 23:55:42 an-a04n01 crmd[28360]: notice: crm_add_logfile: Additional logging available in /var/log/cluster/corosync.log Jun 17 23:55:42 an-a04n01 crmd[28360]: notice: main: CRM Git Version: 368c726 Jun 17 23:55:42 an-a04n01 stonith-ng[28356]: notice: crm_add_logfile: Additional logging available in /var/log/cluster/corosync.log Jun 17 23:55:42 an-a04n01 stonith-ng[28356]: notice: crm_cluster_connect: Connecting to cluster infrastructure: cman
Jun 17 23:55:42 an-a04n01 attrd[28358]:   notice: main: Starting mainloop...
Jun 17 23:55:42 an-a04n01 cib[28355]: notice: crm_cluster_connect: Connecting to cluster infrastructure: cman

Jun 17 23:55:43 an-a04n01 crmd[28360]: notice: crm_cluster_connect: Connecting to cluster infrastructure: cman Jun 17 23:55:43 an-a04n01 crmd[28360]: notice: cman_event_callback: Membership 276: quorum acquired Jun 17 23:55:43 an-a04n01 crmd[28360]: notice: crm_update_peer_state: cman_event_callback: Node an-a04n01.alteeve.ca[1] - state is now member (was (null)) Jun 17 23:55:43 an-a04n01 crmd[28360]: notice: crm_update_peer_state: cman_event_callback: Node an-a04n02.alteeve.ca[2] - state is now member (was (null)) Jun 17 23:55:43 an-a04n01 stonith-ng[28356]: notice: setup_cib: Watching for stonith topology changes Jun 17 23:55:43 an-a04n01 crmd[28360]: notice: do_started: The local CRM is operational Jun 17 23:55:43 an-a04n01 crmd[28360]: notice: do_state_transition: State transition S_STARTING -> S_PENDING [ input=I_PENDING cause=C_FSA_INTERNAL origin=do_started ] Jun 17 23:55:43 an-a04n01 stonith-ng[28356]: notice: unpack_config: On loss of CCM Quorum: Ignore Jun 17 23:55:44 an-a04n01 stonith-ng[28356]: notice: stonith_device_register: Added 'fence_n01_ipmi' to the device list (1 active devices) Jun 17 23:55:45 an-a04n01 stonith-ng[28356]: notice: stonith_device_register: Added 'fence_n02_ipmi' to the device list (2 active devices)

Jun 17 23:56:04 an-a04n01 crmd[28360]: notice: do_state_transition: State transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC cause=C_FSA_INTERNAL origin=do_election_check ] Jun 17 23:56:04 an-a04n01 attrd[28358]: notice: attrd_local_callback: Sending full refresh (origin=crmd) Jun 17 23:56:04 an-a04n01 pengine[28359]: notice: unpack_config: On loss of CCM Quorum: Ignore Jun 17 23:56:04 an-a04n01 pengine[28359]: notice: LogActions: Start fence_n01_ipmi#011(an-a04n01.alteeve.ca) Jun 17 23:56:04 an-a04n01 pengine[28359]: notice: LogActions: Start fence_n02_ipmi#011(an-a04n02.alteeve.ca) Jun 17 23:56:04 an-a04n01 pengine[28359]: notice: LogActions: Start drbd_r0:0#011(an-a04n01.alteeve.ca) Jun 17 23:56:04 an-a04n01 pengine[28359]: notice: LogActions: Start drbd_r0:1#011(an-a04n02.alteeve.ca) Jun 17 23:56:04 an-a04n01 pengine[28359]: notice: LogActions: Start lvm_n01_vg0:0#011(an-a04n01.alteeve.ca - blocked) Jun 17 23:56:04 an-a04n01 pengine[28359]: notice: LogActions: Start lvm_n01_vg0:1#011(an-a04n02.alteeve.ca - blocked) Jun 17 23:56:04 an-a04n01 pengine[28359]: notice: process_pe_message: Calculated Transition 0: /var/lib/pacemaker/pengine/pe-input-152.bz2 Jun 17 23:56:04 an-a04n01 crmd[28360]: notice: te_rsc_command: Initiating action 9: monitor fence_n01_ipmi_monitor_0 on an-a04n02.alteeve.ca Jun 17 23:56:04 an-a04n01 crmd[28360]: notice: te_rsc_command: Initiating action 4: monitor fence_n01_ipmi_monitor_0 on an-a04n01.alteeve.ca (local) Jun 17 23:56:04 an-a04n01 crmd[28360]: notice: te_rsc_command: Initiating action 10: monitor fence_n02_ipmi_monitor_0 on an-a04n02.alteeve.ca Jun 17 23:56:04 an-a04n01 crmd[28360]: notice: te_rsc_command: Initiating action 5: monitor fence_n02_ipmi_monitor_0 on an-a04n01.alteeve.ca (local) Jun 17 23:56:04 an-a04n01 crmd[28360]: notice: te_rsc_command: Initiating action 6: monitor drbd_r0:0_monitor_0 on an-a04n01.alteeve.ca (local) Jun 17 23:56:04 an-a04n01 crmd[28360]: notice: te_rsc_command: Initiating action 11: monitor drbd_r0:1_monitor_0 on an-a04n02.alteeve.ca Jun 17 23:56:04 an-a04n01 crmd[28360]: notice: te_rsc_command: Initiating action 7: monitor lvm_n01_vg0:0_monitor_0 on an-a04n01.alteeve.ca (local) Jun 17 23:56:04 an-a04n01 crmd[28360]: notice: te_rsc_command: Initiating action 12: monitor lvm_n01_vg0:1_monitor_0 on an-a04n02.alteeve.ca Jun 17 23:56:04 an-a04n01 LVM(lvm_n01_vg0)[28419]: WARNING: LVM Volume an-a04n01_vg0 is not available (stopped) Jun 17 23:56:05 an-a04n01 crmd[28360]: notice: process_lrm_event: LRM operation lvm_n01_vg0_monitor_0 (call=19, rc=7, cib-update=28, confirmed=true) not running Jun 17 23:56:05 an-a04n01 crmd[28360]: notice: process_lrm_event: LRM operation drbd_r0_monitor_0 (call=14, rc=7, cib-update=29, confirmed=true) not running Jun 17 23:56:05 an-a04n01 crmd[28360]: notice: process_lrm_event: an-a04n01.alteeve.ca-drbd_r0_monitor_0:14 [ \n ] Jun 17 23:56:05 an-a04n01 crmd[28360]: notice: te_rsc_command: Initiating action 3: probe_complete probe_complete on an-a04n01.alteeve.ca (local) - no waiting Jun 17 23:56:05 an-a04n01 attrd[28358]: notice: attrd_trigger_update: Sending flush op to all hosts for: probe_complete (true) Jun 17 23:56:05 an-a04n01 attrd[28358]: notice: attrd_perform_update: Sent update 4: probe_complete=true Jun 17 23:56:05 an-a04n01 crmd[28360]: notice: te_rsc_command: Initiating action 8: probe_complete probe_complete on an-a04n02.alteeve.ca - no waiting Jun 17 23:56:05 an-a04n01 crmd[28360]: notice: te_rsc_command: Initiating action 13: start fence_n01_ipmi_start_0 on an-a04n01.alteeve.ca (local) Jun 17 23:56:05 an-a04n01 crmd[28360]: notice: te_rsc_command: Initiating action 15: start fence_n02_ipmi_start_0 on an-a04n02.alteeve.ca Jun 17 23:56:05 an-a04n01 crmd[28360]: notice: te_rsc_command: Initiating action 17: start drbd_r0:0_start_0 on an-a04n01.alteeve.ca (local) Jun 17 23:56:06 an-a04n01 stonith-ng[28356]: notice: stonith_device_register: Device 'fence_n01_ipmi' already existed in device list (2 active devices) Jun 17 23:56:06 an-a04n01 crmd[28360]: notice: te_rsc_command: Initiating action 19: start drbd_r0:1_start_0 on an-a04n02.alteeve.ca Jun 17 23:56:06 an-a04n01 crmd[28360]: notice: process_lrm_event: LRM operation fence_n01_ipmi_start_0 (call=25, rc=0, cib-update=30, confirmed=true) ok Jun 17 23:56:06 an-a04n01 crmd[28360]: notice: te_rsc_command: Initiating action 14: monitor fence_n01_ipmi_monitor_60000 on an-a04n01.alteeve.ca (local) Jun 17 23:56:06 an-a04n01 crmd[28360]: notice: te_rsc_command: Initiating action 16: monitor fence_n02_ipmi_monitor_60000 on an-a04n02.alteeve.ca Jun 17 23:56:06 an-a04n01 crmd[28360]: notice: process_lrm_event: LRM operation fence_n01_ipmi_monitor_60000 (call=30, rc=0, cib-update=31, confirmed=false) ok Jun 17 23:56:06 an-a04n01 kernel: block drbd0: Starting worker thread (from cqueue [3274]) Jun 17 23:56:06 an-a04n01 kernel: block drbd0: disk( Diskless -> Attaching ) Jun 17 23:56:06 an-a04n01 kernel: block drbd0: Found 4 transactions (126 active extents) in activity log. Jun 17 23:56:06 an-a04n01 kernel: block drbd0: Method to ensure write ordering: flush Jun 17 23:56:06 an-a04n01 kernel: block drbd0: drbd_bm_resize called with capacity == 909525832 Jun 17 23:56:06 an-a04n01 kernel: block drbd0: resync bitmap: bits=113690729 words=1776418 pages=3470
Jun 17 23:56:06 an-a04n01 kernel: block drbd0: size = 434 GB (454762916 KB)
Jun 17 23:56:06 an-a04n01 kernel: block drbd0: bitmap READ of 3470 pages took 9 jiffies Jun 17 23:56:06 an-a04n01 kernel: block drbd0: recounting of set bits took additional 16 jiffies Jun 17 23:56:06 an-a04n01 kernel: block drbd0: 0 KB (0 bits) marked out-of-sync by on disk bit-map. Jun 17 23:56:06 an-a04n01 kernel: block drbd0: disk( Attaching -> Consistent ) Jun 17 23:56:06 an-a04n01 kernel: block drbd0: attached to UUIDs C71081B1CBAFC620:0000000000000000:F9F9DA52F6D93990:F9F8DA52F6D93991 Jun 17 23:56:06 an-a04n01 kernel: block drbd0: conn( StandAlone -> Unconnected ) Jun 17 23:56:06 an-a04n01 kernel: block drbd0: Starting receiver thread (from drbd0_worker [28524])
Jun 17 23:56:06 an-a04n01 kernel: block drbd0: receiver (re)started
Jun 17 23:56:06 an-a04n01 kernel: block drbd0: conn( Unconnected -> WFConnection ) Jun 17 23:56:06 an-a04n01 attrd[28358]: notice: attrd_trigger_update: Sending flush op to all hosts for: master-drbd_r0 (5) Jun 17 23:56:06 an-a04n01 attrd[28358]: notice: attrd_perform_update: Sent update 9: master-drbd_r0=5 Jun 17 23:56:06 an-a04n01 crmd[28360]: notice: process_lrm_event: LRM operation drbd_r0_start_0 (call=27, rc=0, cib-update=32, confirmed=true) ok Jun 17 23:56:06 an-a04n01 attrd[28358]: notice: attrd_perform_update: Sent update 11: master-drbd_r0=5 Jun 17 23:56:06 an-a04n01 crmd[28360]: notice: te_rsc_command: Initiating action 82: notify drbd_r0:0_post_notify_start_0 on an-a04n01.alteeve.ca (local) Jun 17 23:56:06 an-a04n01 crmd[28360]: notice: te_rsc_command: Initiating action 83: notify drbd_r0:1_post_notify_start_0 on an-a04n02.alteeve.ca Jun 17 23:56:06 an-a04n01 crmd[28360]: notice: process_lrm_event: LRM operation drbd_r0_notify_0 (call=34, rc=0, cib-update=0, confirmed=true) ok Jun 17 23:56:06 an-a04n01 crmd[28360]: notice: run_graph: Transition 0 (Complete=25, Pending=0, Fired=0, Skipped=2, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-152.bz2): Stopped Jun 17 23:56:06 an-a04n01 pengine[28359]: notice: unpack_config: On loss of CCM Quorum: Ignore Jun 17 23:56:06 an-a04n01 pengine[28359]: notice: LogActions: Promote drbd_r0:0#011(Slave -> Master an-a04n01.alteeve.ca) Jun 17 23:56:06 an-a04n01 pengine[28359]: notice: LogActions: Promote drbd_r0:1#011(Slave -> Master an-a04n02.alteeve.ca) Jun 17 23:56:06 an-a04n01 pengine[28359]: notice: LogActions: Start lvm_n01_vg0:0#011(an-a04n01.alteeve.ca) Jun 17 23:56:06 an-a04n01 pengine[28359]: notice: LogActions: Start lvm_n01_vg0:1#011(an-a04n02.alteeve.ca) Jun 17 23:56:06 an-a04n01 pengine[28359]: notice: process_pe_message: Calculated Transition 1: /var/lib/pacemaker/pengine/pe-input-153.bz2 Jun 17 23:56:06 an-a04n01 crmd[28360]: notice: te_rsc_command: Initiating action 84: notify drbd_r0_pre_notify_promote_0 on an-a04n01.alteeve.ca (local) Jun 17 23:56:06 an-a04n01 crmd[28360]: notice: te_rsc_command: Initiating action 86: notify drbd_r0_pre_notify_promote_0 on an-a04n02.alteeve.ca Jun 17 23:56:06 an-a04n01 crmd[28360]: notice: process_lrm_event: LRM operation drbd_r0_notify_0 (call=37, rc=0, cib-update=0, confirmed=true) ok Jun 17 23:56:06 an-a04n01 crmd[28360]: notice: te_rsc_command: Initiating action 13: promote drbd_r0_promote_0 on an-a04n01.alteeve.ca (local) Jun 17 23:56:06 an-a04n01 crmd[28360]: notice: te_rsc_command: Initiating action 16: promote drbd_r0_promote_0 on an-a04n02.alteeve.ca Jun 17 23:56:06 an-a04n01 kernel: block drbd0: helper command: /sbin/drbdadm fence-peer minor-0 Jun 17 23:56:07 an-a04n01 kernel: block drbd0: Handshake successful: Agreed network protocol version 97 Jun 17 23:56:07 an-a04n01 stonith_admin[28637]: notice: crm_log_args: Invoked: stonith_admin --fence an-a04n02.alteeve.ca Jun 17 23:56:07 an-a04n01 stonith-ng[28356]: notice: handle_request: Client stonith_admin.28637.6ed13ba6 wants to fence (off) 'an-a04n02.alteeve.ca' with device '(any)' Jun 17 23:56:07 an-a04n01 stonith-ng[28356]: notice: initiate_remote_stonith_op: Initiating remote operation off for an-a04n02.alteeve.ca: 382bfa3d-55da-4eed-ad8a-a1a883022a35 (0) Jun 17 23:56:07 an-a04n01 stonith-ng[28356]: notice: can_fence_host_with_device: fence_n02_ipmi can fence an-a04n02.alteeve.ca: static-list Jun 17 23:56:07 an-a04n01 stonith-ng[28356]: notice: can_fence_host_with_device: fence_n01_ipmi can not fence an-a04n02.alteeve.ca: static-list Jun 17 23:56:07 an-a04n01 stonith-ng[28356]: notice: can_fence_host_with_device: fence_n02_ipmi can fence an-a04n02.alteeve.ca: static-list Jun 17 23:56:07 an-a04n01 stonith-ng[28356]: notice: can_fence_host_with_device: fence_n01_ipmi can not fence an-a04n02.alteeve.ca: static-list Jun 17 23:56:07 an-a04n01 stonith-ng[28356]: notice: can_fence_host_with_device: fence_n02_ipmi can not fence an-a04n01.alteeve.ca: static-list Jun 17 23:56:07 an-a04n01 stonith-ng[28356]: notice: can_fence_host_with_device: fence_n01_ipmi can fence an-a04n01.alteeve.ca: static-list

Jun 17 23:56:23 an-a04n01 stonith-ng[28356]: notice: log_operation: Operation 'off' [28638] (call 2 from stonith_admin.28637) for host 'an-a04n02.alteeve.ca' with device 'fence_n02_ipmi' returned: 0 (OK) Jun 17 23:56:25 an-a04n01 corosync[28088]: [TOTEM ] A processor failed, forming new configuration. Jun 17 23:56:26 an-a04n01 lrmd[28357]: warning: child_timeout_callback: drbd_r0_promote_0 process (PID 28604) timed out Jun 17 23:56:26 an-a04n01 lrmd[28357]: warning: operation_finished: drbd_r0_promote_0:28604 - timed out after 20000ms Jun 17 23:56:26 an-a04n01 crmd[28360]: error: process_lrm_event: LRM operation drbd_r0_promote_0 (40) Timed Out (timeout=20000ms) Jun 17 23:56:26 an-a04n01 crmd[28360]: notice: process_lrm_event: an-a04n01.alteeve.ca-drbd_r0_promote_0:40 [ allow-two-primaries;\n ] Jun 17 23:56:26 an-a04n01 crmd[28360]: warning: status_from_rc: Action 13 (drbd_r0_promote_0) on an-a04n01.alteeve.ca failed (target: 0 vs. rc: 1): Error Jun 17 23:56:26 an-a04n01 crmd[28360]: warning: update_failcount: Updating failcount for drbd_r0 on an-a04n01.alteeve.ca after failed promote: rc=1 (update=value++, time=1403063786) Jun 17 23:56:26 an-a04n01 crmd[28360]: warning: update_failcount: Updating failcount for drbd_r0 on an-a04n01.alteeve.ca after failed promote: rc=1 (update=value++, time=1403063786) Jun 17 23:56:26 an-a04n01 attrd[28358]: notice: attrd_trigger_update: Sending flush op to all hosts for: fail-count-drbd_r0 (1) Jun 17 23:56:26 an-a04n01 attrd[28358]: notice: attrd_perform_update: Sent update 14: fail-count-drbd_r0=1 Jun 17 23:56:26 an-a04n01 attrd[28358]: notice: attrd_trigger_update: Sending flush op to all hosts for: last-failure-drbd_r0 (1403063786) Jun 17 23:56:26 an-a04n01 attrd[28358]: notice: attrd_perform_update: Sent update 17: last-failure-drbd_r0=1403063786 Jun 17 23:56:26 an-a04n01 attrd[28358]: notice: attrd_trigger_update: Sending flush op to all hosts for: fail-count-drbd_r0 (2) Jun 17 23:56:26 an-a04n01 attrd[28358]: notice: attrd_perform_update: Sent update 19: fail-count-drbd_r0=2 Jun 17 23:56:26 an-a04n01 attrd[28358]: notice: attrd_trigger_update: Sending flush op to all hosts for: last-failure-drbd_r0 (1403063786) Jun 17 23:56:26 an-a04n01 attrd[28358]: notice: attrd_perform_update: Sent update 21: last-failure-drbd_r0=1403063786
Jun 17 23:56:27 an-a04n01 corosync[28088]:   [QUORUM] Members[1]: 1
Jun 17 23:56:27 an-a04n01 corosync[28088]: [TOTEM ] A processor joined or left the membership and a new membership was formed. Jun 17 23:56:27 an-a04n01 crmd[28360]: notice: crm_update_peer_state: cman_event_callback: Node an-a04n02.alteeve.ca[2] - state is now lost (was member) Jun 17 23:56:27 an-a04n01 crmd[28360]: warning: match_down_event: No match for shutdown action on an-a04n02.alteeve.ca Jun 17 23:56:27 an-a04n01 crmd[28360]: notice: peer_update_callback: Stonith/shutdown of an-a04n02.alteeve.ca not matched Jun 17 23:56:27 an-a04n01 crmd[28360]: notice: fail_incompletable_actions: Action 87 (87) is scheduled for an-a04n02.alteeve.ca (offline) Jun 17 23:56:27 an-a04n01 crmd[28360]: notice: fail_incompletable_actions: Action 16 (16) was pending on an-a04n02.alteeve.ca (offline) Jun 17 23:56:27 an-a04n01 crmd[28360]: notice: fail_incompletable_actions: Action 89 (89) is scheduled for an-a04n02.alteeve.ca (offline) Jun 17 23:56:27 an-a04n01 crmd[28360]: notice: fail_incompletable_actions: Action 44 (44) is scheduled for an-a04n02.alteeve.ca (offline) Jun 17 23:56:27 an-a04n01 crmd[28360]: notice: fail_incompletable_actions: Action 43 (43) is scheduled for an-a04n02.alteeve.ca (offline) Jun 17 23:56:27 an-a04n01 crmd[28360]: warning: fail_incompletable_actions: Node an-a04n02.alteeve.ca shutdown resulted in un-runnable actions Jun 17 23:56:27 an-a04n01 crmd[28360]: notice: fail_incompletable_actions: Action 87 (87) is scheduled for an-a04n02.alteeve.ca (offline) Jun 17 23:56:27 an-a04n01 crmd[28360]: notice: fail_incompletable_actions: Action 16 (16) was pending on an-a04n02.alteeve.ca (offline) Jun 17 23:56:27 an-a04n01 crmd[28360]: notice: fail_incompletable_actions: Action 89 (89) is scheduled for an-a04n02.alteeve.ca (offline) Jun 17 23:56:27 an-a04n01 crmd[28360]: notice: fail_incompletable_actions: Action 44 (44) is scheduled for an-a04n02.alteeve.ca (offline) Jun 17 23:56:27 an-a04n01 crmd[28360]: notice: fail_incompletable_actions: Action 43 (43) is scheduled for an-a04n02.alteeve.ca (offline)
Jun 17 23:56:27 an-a04n01 kernel: dlm: closing connection to node 2
Jun 17 23:56:27 an-a04n01 crmd[28360]: warning: fail_incompletable_actions: Node an-a04n02.alteeve.ca shutdown resulted in un-runnable actions Jun 17 23:56:27 an-a04n01 corosync[28088]: [CPG ] chosen downlist: sender r(0) ip(10.20.40.1) ; members(old:2 left:1) Jun 17 23:56:27 an-a04n01 corosync[28088]: [MAIN ] Completed service synchronization, ready to provide service. Jun 17 23:56:27 an-a04n01 stonith-ng[28356]: notice: remote_op_done: Operation off of an-a04n02.alteeve.ca by an-a04n01.alteeve.ca for stonith_admin.28...@an-a04n01.alteeve.ca.382bfa3d: OK Jun 17 23:56:27 an-a04n01 attrd[28358]: notice: attrd_local_callback: Sending full refresh (origin=crmd) Jun 17 23:56:27 an-a04n01 attrd[28358]: notice: attrd_trigger_update: Sending flush op to all hosts for: master-drbd_r0 (5) Jun 17 23:56:27 an-a04n01 crmd[28360]: warning: match_down_event: No match for shutdown action on an-a04n02.alteeve.ca Jun 17 23:56:27 an-a04n01 crmd[28360]: notice: peer_update_callback: Stonith/shutdown of an-a04n02.alteeve.ca not matched Jun 17 23:56:27 an-a04n01 crmd[28360]: notice: fail_incompletable_actions: Action 87 (87) is scheduled for an-a04n02.alteeve.ca (offline) Jun 17 23:56:27 an-a04n01 crmd[28360]: notice: fail_incompletable_actions: Action 16 (16) was pending on an-a04n02.alteeve.ca (offline) Jun 17 23:56:27 an-a04n01 crmd[28360]: notice: fail_incompletable_actions: Action 89 (89) is scheduled for an-a04n02.alteeve.ca (offline) Jun 17 23:56:27 an-a04n01 crmd[28360]: notice: fail_incompletable_actions: Action 44 (44) is scheduled for an-a04n02.alteeve.ca (offline) Jun 17 23:56:27 an-a04n01 crmd[28360]: notice: fail_incompletable_actions: Action 43 (43) is scheduled for an-a04n02.alteeve.ca (offline) Jun 17 23:56:27 an-a04n01 crmd[28360]: warning: fail_incompletable_actions: Node an-a04n02.alteeve.ca shutdown resulted in un-runnable actions Jun 17 23:56:27 an-a04n01 crmd[28360]: notice: tengine_stonith_notify: Peer an-a04n02.alteeve.ca was terminated (off) by an-a04n01.alteeve.ca for an-a04n01.alteeve.ca: OK (ref=382bfa3d-55da-4eed-ad8a-a1a883022a35) by client stonith_admin.28637 Jun 17 23:56:27 an-a04n01 crmd[28360]: notice: tengine_stonith_notify: Notified CMAN that 'an-a04n02.alteeve.ca' is now fenced
Jun 17 23:56:27 an-a04n01 fenced[28143]: fencing node an-a04n02.alteeve.ca
Jun 17 23:56:27 an-a04n01 crmd[28360]: notice: te_rsc_command: Initiating action 85: notify drbd_r0_post_notify_promote_0 on an-a04n01.alteeve.ca (local) Jun 17 23:56:27 an-a04n01 stonith_admin-fence-peer.sh[28708]: stonith_admin successfully fenced peer an-a04n02.alteeve.ca. Jun 17 23:56:27 an-a04n01 kernel: block drbd0: helper command: /sbin/drbdadm fence-peer minor-0 exit code 7 (0x700) Jun 17 23:56:27 an-a04n01 kernel: block drbd0: fence-peer helper returned 7 (peer was stonithed) Jun 17 23:56:27 an-a04n01 kernel: block drbd0: role( Secondary -> Primary ) disk( Consistent -> UpToDate ) pdsk( DUnknown -> Outdated ) Jun 17 23:56:27 an-a04n01 kernel: block drbd0: new current UUID B704B7175D09E91D:C71081B1CBAFC620:F9F9DA52F6D93990:F9F8DA52F6D93991 Jun 17 23:56:27 an-a04n01 kernel: block drbd0: conn( WFConnection -> WFReportParams ) Jun 17 23:56:27 an-a04n01 kernel: block drbd0: Starting asender thread (from drbd0_receiver [28542]) Jun 17 23:56:27 an-a04n01 kernel: block drbd0: data-integrity-alg: <not-used> Jun 17 23:56:27 an-a04n01 attrd[28358]: notice: attrd_trigger_update: Sending flush op to all hosts for: last-failure-drbd_r0 (1403063786) Jun 17 23:56:27 an-a04n01 attrd[28358]: notice: attrd_trigger_update: Sending flush op to all hosts for: fail-count-drbd_r0 (2) Jun 17 23:56:27 an-a04n01 attrd[28358]: notice: attrd_perform_update: Sent update 27: fail-count-drbd_r0=2 Jun 17 23:56:27 an-a04n01 attrd[28358]: notice: attrd_trigger_update: Sending flush op to all hosts for: probe_complete (true) Jun 17 23:56:27 an-a04n01 attrd[28358]: notice: attrd_trigger_update: Sending flush op to all hosts for: master-drbd_r0 (10000) Jun 17 23:56:27 an-a04n01 attrd[28358]: notice: attrd_perform_update: Sent update 31: master-drbd_r0=10000 Jun 17 23:56:27 an-a04n01 crmd[28360]: notice: process_lrm_event: LRM operation drbd_r0_notify_0 (call=43, rc=0, cib-update=0, confirmed=true) ok Jun 17 23:56:27 an-a04n01 crmd[28360]: notice: run_graph: Transition 1 (Complete=12, Pending=0, Fired=0, Skipped=8, Incomplete=4, Source=/var/lib/pacemaker/pengine/pe-input-153.bz2): Stopped Jun 17 23:56:27 an-a04n01 pengine[28359]: notice: unpack_config: On loss of CCM Quorum: Ignore Jun 17 23:56:27 an-a04n01 pengine[28359]: warning: unpack_rsc_op: Processing failed op promote for drbd_r0:0 on an-a04n01.alteeve.ca: unknown error (1) Jun 17 23:56:27 an-a04n01 pengine[28359]: notice: LogActions: Start fence_n02_ipmi#011(an-a04n01.alteeve.ca) Jun 17 23:56:27 an-a04n01 pengine[28359]: notice: LogActions: Demote drbd_r0:0#011(Master -> Slave an-a04n01.alteeve.ca) Jun 17 23:56:27 an-a04n01 pengine[28359]: notice: LogActions: Recover drbd_r0:0#011(Master an-a04n01.alteeve.ca) Jun 17 23:56:27 an-a04n01 pengine[28359]: notice: LogActions: Start lvm_n01_vg0:0#011(an-a04n01.alteeve.ca - blocked) Jun 17 23:56:27 an-a04n01 pengine[28359]: notice: process_pe_message: Calculated Transition 2: /var/lib/pacemaker/pengine/pe-input-154.bz2 Jun 17 23:56:27 an-a04n01 crmd[28360]: notice: te_rsc_command: Initiating action 8: start fence_n02_ipmi_start_0 on an-a04n01.alteeve.ca (local) Jun 17 23:56:27 an-a04n01 crmd[28360]: notice: te_rsc_command: Initiating action 75: notify drbd_r0_pre_notify_demote_0 on an-a04n01.alteeve.ca (local) Jun 17 23:56:27 an-a04n01 fence_pcmk[28761]: Requesting Pacemaker fence an-a04n02.alteeve.ca (reset) Jun 17 23:56:27 an-a04n01 stonith-ng[28356]: notice: stonith_device_register: Device 'fence_n02_ipmi' already existed in device list (2 active devices) Jun 17 23:56:27 an-a04n01 stonith_admin[28763]: notice: crm_log_args: Invoked: stonith_admin --reboot an-a04n02.alteeve.ca --tolerance 5s --tag cman Jun 17 23:56:27 an-a04n01 stonith-ng[28356]: notice: handle_request: Client stonith_admin.cman.28763.4e2c3020 wants to fence (reboot) 'an-a04n02.alteeve.ca' with device '(any)' Jun 17 23:56:27 an-a04n01 stonith-ng[28356]: notice: initiate_remote_stonith_op: Initiating remote operation reboot for an-a04n02.alteeve.ca: bbb6c5c4-d1a7-4df7-a8b0-e33f4ad74860 (0) Jun 17 23:56:27 an-a04n01 stonith-ng[28356]: notice: can_fence_host_with_device: fence_n02_ipmi can fence an-a04n02.alteeve.ca: static-list Jun 17 23:56:27 an-a04n01 stonith-ng[28356]: notice: can_fence_host_with_device: fence_n01_ipmi can not fence an-a04n02.alteeve.ca: static-list Jun 17 23:56:27 an-a04n01 stonith-ng[28356]: notice: can_fence_host_with_device: fence_n02_ipmi can fence an-a04n02.alteeve.ca: static-list Jun 17 23:56:27 an-a04n01 stonith-ng[28356]: notice: can_fence_host_with_device: fence_n01_ipmi can not fence an-a04n02.alteeve.ca: static-list Jun 17 23:56:27 an-a04n01 kernel: block drbd0: PingAck did not arrive in time. Jun 17 23:56:27 an-a04n01 kernel: block drbd0: conn( WFReportParams -> NetworkFailure )
Jun 17 23:56:27 an-a04n01 kernel: block drbd0: asender terminated
Jun 17 23:56:27 an-a04n01 kernel: block drbd0: Terminating drbd0_asender
Jun 17 23:56:27 an-a04n01 kernel: block drbd0: Connection closed
Jun 17 23:56:27 an-a04n01 kernel: block drbd0: conn( NetworkFailure -> Unconnected )
Jun 17 23:56:27 an-a04n01 kernel: block drbd0: receiver terminated
Jun 17 23:56:27 an-a04n01 kernel: block drbd0: Restarting drbd0_receiver
Jun 17 23:56:27 an-a04n01 kernel: block drbd0: receiver (re)started
Jun 17 23:56:27 an-a04n01 kernel: block drbd0: conn( Unconnected -> WFConnection ) Jun 17 23:56:28 an-a04n01 crmd[28360]: notice: process_lrm_event: LRM operation fence_n02_ipmi_start_0 (call=46, rc=0, cib-update=48, confirmed=true) ok Jun 17 23:56:28 an-a04n01 crmd[28360]: notice: process_lrm_event: LRM operation drbd_r0_notify_0 (call=48, rc=0, cib-update=0, confirmed=true) ok Jun 17 23:56:28 an-a04n01 crmd[28360]: notice: te_rsc_command: Initiating action 10: demote drbd_r0_demote_0 on an-a04n01.alteeve.ca (local) Jun 17 23:56:28 an-a04n01 crmd[28360]: notice: te_rsc_command: Initiating action 9: monitor fence_n02_ipmi_monitor_60000 on an-a04n01.alteeve.ca (local)
Jun 17 23:56:28 an-a04n01 kernel: block drbd0: role( Primary -> Secondary )
Jun 17 23:56:28 an-a04n01 kernel: block drbd0: bitmap WRITE of 0 pages took 0 jiffies Jun 17 23:56:28 an-a04n01 kernel: block drbd0: 0 KB (0 bits) marked out-of-sync by on disk bit-map. Jun 17 23:56:28 an-a04n01 crmd[28360]: notice: process_lrm_event: LRM operation drbd_r0_demote_0 (call=52, rc=0, cib-update=49, confirmed=true) ok Jun 17 23:56:28 an-a04n01 crmd[28360]: notice: te_rsc_command: Initiating action 76: notify drbd_r0_post_notify_demote_0 on an-a04n01.alteeve.ca (local) Jun 17 23:56:28 an-a04n01 crmd[28360]: notice: process_lrm_event: LRM operation drbd_r0_notify_0 (call=57, rc=0, cib-update=0, confirmed=true) ok Jun 17 23:56:28 an-a04n01 crmd[28360]: notice: te_rsc_command: Initiating action 74: notify drbd_r0_pre_notify_stop_0 on an-a04n01.alteeve.ca (local) Jun 17 23:56:28 an-a04n01 crmd[28360]: notice: process_lrm_event: LRM operation drbd_r0_notify_0 (call=60, rc=0, cib-update=0, confirmed=true) ok Jun 17 23:56:28 an-a04n01 crmd[28360]: notice: te_rsc_command: Initiating action 2: stop drbd_r0_stop_0 on an-a04n01.alteeve.ca (local) Jun 17 23:56:28 an-a04n01 kernel: block drbd0: conn( WFConnection -> Disconnecting ) Jun 17 23:56:28 an-a04n01 kernel: block drbd0: Discarding network configuration.
Jun 17 23:56:28 an-a04n01 kernel: block drbd0: Connection closed
Jun 17 23:56:28 an-a04n01 kernel: block drbd0: conn( Disconnecting -> StandAlone )
Jun 17 23:56:28 an-a04n01 kernel: block drbd0: receiver terminated
Jun 17 23:56:28 an-a04n01 kernel: block drbd0: Terminating drbd0_receiver
Jun 17 23:56:28 an-a04n01 kernel: block drbd0: disk( UpToDate -> Failed )
Jun 17 23:56:28 an-a04n01 kernel: block drbd0: bitmap WRITE of 0 pages took 0 jiffies Jun 17 23:56:28 an-a04n01 kernel: block drbd0: 0 KB (0 bits) marked out-of-sync by on disk bit-map.
Jun 17 23:56:28 an-a04n01 kernel: block drbd0: disk( Failed -> Diskless )
Jun 17 23:56:28 an-a04n01 kernel: block drbd0: drbd_bm_resize called with capacity == 0
Jun 17 23:56:28 an-a04n01 kernel: block drbd0: worker terminated
Jun 17 23:56:28 an-a04n01 kernel: block drbd0: Terminating drbd0_worker
Jun 17 23:56:28 an-a04n01 attrd[28358]: notice: attrd_trigger_update: Sending flush op to all hosts for: master-drbd_r0 (<null>) Jun 17 23:56:28 an-a04n01 attrd[28358]: notice: attrd_perform_update: Sent delete 33: node=an-a04n01.alteeve.ca, attr=master-drbd_r0, id=<n/a>, set=(null), section=status Jun 17 23:56:28 an-a04n01 crmd[28360]: notice: process_lrm_event: LRM operation drbd_r0_stop_0 (call=63, rc=0, cib-update=50, confirmed=true) ok Jun 17 23:56:43 an-a04n01 stonith-ng[28356]: notice: log_operation: Operation 'reboot' [28771] (call 2 from stonith_admin.cman.28763) for host 'an-a04n02.alteeve.ca' with device 'fence_n02_ipmi' returned: 0 (OK) Jun 17 23:56:43 an-a04n01 stonith-ng[28356]: notice: remote_op_done: Operation reboot of an-a04n02.alteeve.ca by an-a04n01.alteeve.ca for stonith_admin.cman.28...@an-a04n01.alteeve.ca.bbb6c5c4: OK Jun 17 23:56:43 an-a04n01 crmd[28360]: notice: tengine_stonith_notify: Peer an-a04n02.alteeve.ca was terminated (reboot) by an-a04n01.alteeve.ca for an-a04n01.alteeve.ca: OK (ref=bbb6c5c4-d1a7-4df7-a8b0-e33f4ad74860) by client stonith_admin.cman.28763 Jun 17 23:56:43 an-a04n01 crmd[28360]: notice: tengine_stonith_notify: Notified CMAN that 'an-a04n02.alteeve.ca' is now fenced
Jun 17 23:56:43 an-a04n01 fenced[28143]: fence an-a04n02.alteeve.ca success
Jun 17 23:56:43 an-a04n01 crmd[28360]: notice: process_lrm_event: LRM operation fence_n02_ipmi_monitor_60000 (call=54, rc=0, cib-update=54, confirmed=false) ok Jun 17 23:56:43 an-a04n01 crmd[28360]: notice: run_graph: Transition 2 (Complete=19, Pending=0, Fired=0, Skipped=6, Incomplete=4, Source=/var/lib/pacemaker/pengine/pe-input-154.bz2): Stopped Jun 17 23:56:43 an-a04n01 pengine[28359]: notice: unpack_config: On loss of CCM Quorum: Ignore Jun 17 23:56:43 an-a04n01 pengine[28359]: warning: unpack_rsc_op: Processing failed op promote for drbd_r0:0 on an-a04n01.alteeve.ca: unknown error (1) Jun 17 23:56:43 an-a04n01 pengine[28359]: notice: LogActions: Start drbd_r0:0#011(an-a04n01.alteeve.ca) Jun 17 23:56:43 an-a04n01 pengine[28359]: notice: LogActions: Start lvm_n01_vg0:0#011(an-a04n01.alteeve.ca - blocked) Jun 17 23:56:43 an-a04n01 pengine[28359]: notice: process_pe_message: Calculated Transition 3: /var/lib/pacemaker/pengine/pe-input-155.bz2 Jun 17 23:56:43 an-a04n01 crmd[28360]: notice: te_rsc_command: Initiating action 10: start drbd_r0_start_0 on an-a04n01.alteeve.ca (local) Jun 17 23:56:43 an-a04n01 kernel: block drbd0: Starting worker thread (from cqueue [3274]) Jun 17 23:56:43 an-a04n01 kernel: block drbd0: disk( Diskless -> Attaching ) Jun 17 23:56:43 an-a04n01 kernel: block drbd0: Found 4 transactions (126 active extents) in activity log. Jun 17 23:56:43 an-a04n01 kernel: block drbd0: Method to ensure write ordering: flush Jun 17 23:56:43 an-a04n01 kernel: block drbd0: drbd_bm_resize called with capacity == 909525832 Jun 17 23:56:43 an-a04n01 kernel: block drbd0: resync bitmap: bits=113690729 words=1776418 pages=3470
Jun 17 23:56:43 an-a04n01 kernel: block drbd0: size = 434 GB (454762916 KB)
Jun 17 23:56:43 an-a04n01 kernel: block drbd0: bitmap READ of 3470 pages took 9 jiffies Jun 17 23:56:43 an-a04n01 kernel: block drbd0: recounting of set bits took additional 16 jiffies Jun 17 23:56:43 an-a04n01 kernel: block drbd0: 0 KB (0 bits) marked out-of-sync by on disk bit-map. Jun 17 23:56:43 an-a04n01 kernel: block drbd0: disk( Attaching -> UpToDate ) pdsk( DUnknown -> Outdated ) Jun 17 23:56:43 an-a04n01 kernel: block drbd0: attached to UUIDs B704B7175D09E91D:C71081B1CBAFC620:F9F9DA52F6D93990:F9F8DA52F6D93991 Jun 17 23:56:43 an-a04n01 kernel: block drbd0: conn( StandAlone -> Unconnected ) Jun 17 23:56:43 an-a04n01 kernel: block drbd0: Starting receiver thread (from drbd0_worker [29023])
Jun 17 23:56:43 an-a04n01 kernel: block drbd0: receiver (re)started
Jun 17 23:56:43 an-a04n01 kernel: block drbd0: conn( Unconnected -> WFConnection ) Jun 17 23:56:43 an-a04n01 attrd[28358]: notice: attrd_trigger_update: Sending flush op to all hosts for: master-drbd_r0 (10000) Jun 17 23:56:43 an-a04n01 attrd[28358]: notice: attrd_perform_update: Sent update 37: master-drbd_r0=10000 Jun 17 23:56:43 an-a04n01 crmd[28360]: notice: process_lrm_event: LRM operation drbd_r0_start_0 (call=67, rc=0, cib-update=56, confirmed=true) ok Jun 17 23:56:43 an-a04n01 crmd[28360]: notice: te_rsc_command: Initiating action 73: notify drbd_r0_post_notify_start_0 on an-a04n01.alteeve.ca (local) Jun 17 23:56:43 an-a04n01 crmd[28360]: notice: process_lrm_event: LRM operation drbd_r0_notify_0 (call=70, rc=0, cib-update=0, confirmed=true) ok Jun 17 23:56:43 an-a04n01 crmd[28360]: notice: run_graph: Transition 3 (Complete=8, Pending=0, Fired=0, Skipped=1, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-155.bz2): Stopped Jun 17 23:56:43 an-a04n01 pengine[28359]: notice: unpack_config: On loss of CCM Quorum: Ignore Jun 17 23:56:43 an-a04n01 pengine[28359]: warning: unpack_rsc_op: Processing failed op promote for drbd_r0:0 on an-a04n01.alteeve.ca: unknown error (1) Jun 17 23:56:43 an-a04n01 pengine[28359]: notice: LogActions: Promote drbd_r0:0#011(Slave -> Master an-a04n01.alteeve.ca) Jun 17 23:56:43 an-a04n01 pengine[28359]: notice: LogActions: Start lvm_n01_vg0:0#011(an-a04n01.alteeve.ca) Jun 17 23:56:43 an-a04n01 pengine[28359]: notice: process_pe_message: Calculated Transition 4: /var/lib/pacemaker/pengine/pe-input-156.bz2 Jun 17 23:56:43 an-a04n01 crmd[28360]: notice: te_rsc_command: Initiating action 77: notify drbd_r0_pre_notify_promote_0 on an-a04n01.alteeve.ca (local) Jun 17 23:56:43 an-a04n01 crmd[28360]: notice: process_lrm_event: LRM operation drbd_r0_notify_0 (call=73, rc=0, cib-update=0, confirmed=true) ok Jun 17 23:56:43 an-a04n01 crmd[28360]: notice: te_rsc_command: Initiating action 12: promote drbd_r0_promote_0 on an-a04n01.alteeve.ca (local)
Jun 17 23:56:43 an-a04n01 kernel: block drbd0: role( Secondary -> Primary )
Jun 17 23:56:43 an-a04n01 crmd[28360]: notice: process_lrm_event: LRM operation drbd_r0_promote_0 (call=76, rc=0, cib-update=58, confirmed=true) ok Jun 17 23:56:43 an-a04n01 crmd[28360]: notice: te_rsc_command: Initiating action 78: notify drbd_r0_post_notify_promote_0 on an-a04n01.alteeve.ca (local) Jun 17 23:56:43 an-a04n01 crmd[28360]: notice: process_lrm_event: LRM operation drbd_r0_notify_0 (call=79, rc=0, cib-update=0, confirmed=true) ok Jun 17 23:56:43 an-a04n01 crmd[28360]: notice: te_rsc_command: Initiating action 37: start lvm_n01_vg0_start_0 on an-a04n01.alteeve.ca (local) Jun 17 23:56:44 an-a04n01 LVM(lvm_n01_vg0)[29173]: INFO: Activating volume group an-a04n01_vg0 Jun 17 23:56:44 an-a04n01 LVM(lvm_n01_vg0)[29173]: INFO: Reading all physical volumes. This may take a while... Found volume group "an-a04n01_vg0" using metadata type lvm2 Jun 17 23:56:44 an-a04n01 LVM(lvm_n01_vg0)[29173]: INFO: 1 logical volume(s) in volume group "an-a04n01_vg0" now active Jun 17 23:56:44 an-a04n01 crmd[28360]: notice: process_lrm_event: LRM operation lvm_n01_vg0_start_0 (call=82, rc=0, cib-update=59, confirmed=true) ok Jun 17 23:56:44 an-a04n01 crmd[28360]: notice: te_rsc_command: Initiating action 79: notify lvm_n01_vg0_post_notify_start_0 on an-a04n01.alteeve.ca (local) Jun 17 23:56:44 an-a04n01 crmd[28360]: notice: process_lrm_event: LRM operation lvm_n01_vg0_notify_0 (call=85, rc=0, cib-update=0, confirmed=true) ok Jun 17 23:56:44 an-a04n01 crmd[28360]: notice: te_rsc_command: Initiating action 38: monitor lvm_n01_vg0_monitor_30000 on an-a04n01.alteeve.ca (local) Jun 17 23:56:44 an-a04n01 crmd[28360]: notice: process_lrm_event: LRM operation lvm_n01_vg0_monitor_30000 (call=88, rc=0, cib-update=60, confirmed=false) ok Jun 17 23:56:44 an-a04n01 crmd[28360]: notice: run_graph: Transition 4 (Complete=18, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-156.bz2): Complete Jun 17 23:56:44 an-a04n01 crmd[28360]: notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ]
====

Logs from the always-fenced 'an-a04n02', starting with '/etc/init.d/pacemaker start':

====
Jun 17 23:55:32 an-a04n02 kernel: DLM (built Apr 11 2014 17:28:07) installed
Jun 17 23:55:33 an-a04n02 corosync[7176]: [MAIN ] Corosync Cluster Engine ('1.4.1'): started and ready to provide service. Jun 17 23:55:33 an-a04n02 corosync[7176]: [MAIN ] Corosync built-in features: nss dbus rdma snmp Jun 17 23:55:33 an-a04n02 corosync[7176]: [MAIN ] Successfully read config from /etc/cluster/cluster.conf Jun 17 23:55:33 an-a04n02 corosync[7176]: [MAIN ] Successfully parsed cman config Jun 17 23:55:33 an-a04n02 corosync[7176]: [TOTEM ] Initializing transport (UDP/IP Multicast). Jun 17 23:55:33 an-a04n02 corosync[7176]: [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0). Jun 17 23:55:33 an-a04n02 corosync[7176]: [TOTEM ] The network interface [10.20.40.2] is now up. Jun 17 23:55:33 an-a04n02 corosync[7176]: [QUORUM] Using quorum provider quorum_cman Jun 17 23:55:33 an-a04n02 corosync[7176]: [SERV ] Service engine loaded: corosync cluster quorum service v0.1 Jun 17 23:55:33 an-a04n02 corosync[7176]: [CMAN ] CMAN 3.0.12.1 (built Apr 3 2014 05:12:26) started Jun 17 23:55:33 an-a04n02 corosync[7176]: [SERV ] Service engine loaded: corosync CMAN membership service 2.90 Jun 17 23:55:33 an-a04n02 corosync[7176]: [SERV ] Service engine loaded: openais checkpoint service B.01.01 Jun 17 23:55:33 an-a04n02 corosync[7176]: [SERV ] Service engine loaded: corosync extended virtual synchrony service Jun 17 23:55:33 an-a04n02 corosync[7176]: [SERV ] Service engine loaded: corosync configuration service Jun 17 23:55:33 an-a04n02 corosync[7176]: [SERV ] Service engine loaded: corosync cluster closed process group service v1.01 Jun 17 23:55:33 an-a04n02 corosync[7176]: [SERV ] Service engine loaded: corosync cluster config database access v1.01 Jun 17 23:55:33 an-a04n02 corosync[7176]: [SERV ] Service engine loaded: corosync profile loading service Jun 17 23:55:33 an-a04n02 corosync[7176]: [QUORUM] Using quorum provider quorum_cman Jun 17 23:55:33 an-a04n02 corosync[7176]: [SERV ] Service engine loaded: corosync cluster quorum service v0.1 Jun 17 23:55:33 an-a04n02 corosync[7176]: [MAIN ] Compatibility mode set to whitetank. Using V1 and V2 of the synchronization engine. Jun 17 23:55:33 an-a04n02 corosync[7176]: [TOTEM ] A processor joined or left the membership and a new membership was formed. Jun 17 23:55:33 an-a04n02 corosync[7176]: [TOTEM ] A processor joined or left the membership and a new membership was formed. Jun 17 23:55:33 an-a04n02 corosync[7176]: [CMAN ] quorum regained, resuming activity Jun 17 23:55:33 an-a04n02 corosync[7176]: [QUORUM] This node is within the primary component and will provide service.
Jun 17 23:55:33 an-a04n02 corosync[7176]:   [QUORUM] Members[1]: 2
Jun 17 23:55:33 an-a04n02 corosync[7176]:   [QUORUM] Members[1]: 2
Jun 17 23:55:33 an-a04n02 corosync[7176]:   [QUORUM] Members[2]: 1 2
Jun 17 23:55:33 an-a04n02 corosync[7176]:   [QUORUM] Members[2]: 1 2
Jun 17 23:55:33 an-a04n02 corosync[7176]: [CPG ] chosen downlist: sender r(0) ip(10.20.40.1) ; members(old:1 left:0) Jun 17 23:55:33 an-a04n02 corosync[7176]: [MAIN ] Completed service synchronization, ready to provide service.
Jun 17 23:55:37 an-a04n02 fenced[7231]: fenced 3.0.12.1 started
Jun 17 23:55:37 an-a04n02 dlm_controld[7254]: dlm_controld 3.0.12.1 started
Jun 17 23:55:38 an-a04n02 gfs_controld[7306]: gfs_controld 3.0.12.1 started
Jun 17 23:55:39 an-a04n02 pacemaker: Attempting to start clvmd
Jun 17 23:55:40 an-a04n02 kernel: dlm: Using TCP for communications
Jun 17 23:55:40 an-a04n02 kernel: dlm: got connection from 1
Jun 17 23:55:41 an-a04n02 clvmd: Cluster LVM daemon started - connected to CMAN
Jun 17 23:55:41 an-a04n02 pacemaker: Starting Pacemaker Cluster Manager
Jun 17 23:55:42 an-a04n02 pacemakerd[7437]: notice: crm_add_logfile: Additional logging available in /var/log/cluster/corosync.log Jun 17 23:55:42 an-a04n02 pacemakerd[7437]: notice: main: Starting Pacemaker 1.1.10-14.el6_5.3 (Build: 368c726): generated-manpages agent-manpages ascii-docs publican-docs ncurses libqb-logging libqb-ipc nagios corosync-plugin cman Jun 17 23:55:42 an-a04n02 lrmd[7445]: notice: crm_add_logfile: Additional logging available in /var/log/cluster/corosync.log Jun 17 23:55:42 an-a04n02 stonith-ng[7444]: notice: crm_add_logfile: Additional logging available in /var/log/cluster/corosync.log Jun 17 23:55:42 an-a04n02 cib[7443]: notice: crm_add_logfile: Additional logging available in /var/log/cluster/corosync.log Jun 17 23:55:42 an-a04n02 crmd[7448]: notice: crm_add_logfile: Additional logging available in /var/log/cluster/corosync.log Jun 17 23:55:42 an-a04n02 pengine[7447]: notice: crm_add_logfile: Additional logging available in /var/log/cluster/corosync.log Jun 17 23:55:42 an-a04n02 attrd[7446]: notice: crm_add_logfile: Additional logging available in /var/log/cluster/corosync.log Jun 17 23:55:42 an-a04n02 stonith-ng[7444]: notice: crm_cluster_connect: Connecting to cluster infrastructure: cman Jun 17 23:55:42 an-a04n02 crmd[7448]: notice: main: CRM Git Version: 368c726 Jun 17 23:55:42 an-a04n02 attrd[7446]: notice: crm_cluster_connect: Connecting to cluster infrastructure: cman
Jun 17 23:55:42 an-a04n02 attrd[7446]:   notice: main: Starting mainloop...
Jun 17 23:55:42 an-a04n02 cib[7443]: notice: crm_cluster_connect: Connecting to cluster infrastructure: cman Jun 17 23:55:43 an-a04n02 crmd[7448]: notice: crm_cluster_connect: Connecting to cluster infrastructure: cman Jun 17 23:55:43 an-a04n02 crmd[7448]: notice: cman_event_callback: Membership 276: quorum acquired Jun 17 23:55:43 an-a04n02 crmd[7448]: notice: crm_update_peer_state: cman_event_callback: Node an-a04n01.alteeve.ca[1] - state is now member (was (null)) Jun 17 23:55:43 an-a04n02 crmd[7448]: notice: crm_update_peer_state: cman_event_callback: Node an-a04n02.alteeve.ca[2] - state is now member (was (null)) Jun 17 23:55:43 an-a04n02 stonith-ng[7444]: notice: setup_cib: Watching for stonith topology changes Jun 17 23:55:43 an-a04n02 crmd[7448]: notice: do_started: The local CRM is operational Jun 17 23:55:43 an-a04n02 crmd[7448]: notice: do_state_transition: State transition S_STARTING -> S_PENDING [ input=I_PENDING cause=C_FSA_INTERNAL origin=do_started ] Jun 17 23:55:43 an-a04n02 stonith-ng[7444]: notice: unpack_config: On loss of CCM Quorum: Ignore Jun 17 23:55:44 an-a04n02 stonith-ng[7444]: notice: stonith_device_register: Added 'fence_n01_ipmi' to the device list (1 active devices) Jun 17 23:55:45 an-a04n02 stonith-ng[7444]: notice: stonith_device_register: Added 'fence_n02_ipmi' to the device list (2 active devices)

Jun 17 23:56:04 an-a04n02 crmd[7448]: warning: do_log: FSA: Input I_DC_TIMEOUT from crm_timer_popped() received in state S_PENDING Jun 17 23:56:04 an-a04n02 crmd[7448]: notice: do_state_transition: State transition S_ELECTION -> S_PENDING [ input=I_PENDING cause=C_FSA_INTERNAL origin=do_election_count_vote ] Jun 17 23:56:04 an-a04n02 attrd[7446]: notice: attrd_local_callback: Sending full refresh (origin=crmd) Jun 17 23:56:04 an-a04n02 crmd[7448]: notice: do_state_transition: State transition S_PENDING -> S_NOT_DC [ input=I_NOT_DC cause=C_HA_MESSAGE origin=do_cl_join_finalize_respond ] Jun 17 23:56:05 an-a04n02 LVM(lvm_n01_vg0)[7509]: WARNING: LVM Volume an-a04n01_vg0 is not available (stopped) Jun 17 23:56:05 an-a04n02 crmd[7448]: notice: process_lrm_event: LRM operation lvm_n01_vg0_monitor_0 (call=20, rc=7, cib-update=11, confirmed=true) not running Jun 17 23:56:05 an-a04n02 crmd[7448]: notice: process_lrm_event: LRM operation drbd_r0_monitor_0 (call=15, rc=7, cib-update=12, confirmed=true) not running Jun 17 23:56:05 an-a04n02 crmd[7448]: notice: process_lrm_event: an-a04n02.alteeve.ca-drbd_r0_monitor_0:15 [ \n ] Jun 17 23:56:05 an-a04n02 attrd[7446]: notice: attrd_trigger_update: Sending flush op to all hosts for: probe_complete (true) Jun 17 23:56:05 an-a04n02 attrd[7446]: notice: attrd_perform_update: Sent update 5: probe_complete=true Jun 17 23:56:06 an-a04n02 stonith-ng[7444]: notice: stonith_device_register: Device 'fence_n02_ipmi' already existed in device list (2 active devices) Jun 17 23:56:06 an-a04n02 crmd[7448]: notice: process_lrm_event: LRM operation fence_n02_ipmi_start_0 (call=25, rc=0, cib-update=13, confirmed=true) ok Jun 17 23:56:06 an-a04n02 crmd[7448]: notice: process_lrm_event: LRM operation fence_n02_ipmi_monitor_60000 (call=30, rc=0, cib-update=14, confirmed=false) ok Jun 17 23:56:06 an-a04n02 kernel: block drbd0: Starting worker thread (from cqueue [3220]) Jun 17 23:56:06 an-a04n02 kernel: block drbd0: disk( Diskless -> Attaching ) Jun 17 23:56:06 an-a04n02 kernel: block drbd0: Found 3 transactions (3 active extents) in activity log. Jun 17 23:56:06 an-a04n02 kernel: block drbd0: Method to ensure write ordering: flush Jun 17 23:56:06 an-a04n02 kernel: block drbd0: drbd_bm_resize called with capacity == 909525832 Jun 17 23:56:06 an-a04n02 kernel: block drbd0: resync bitmap: bits=113690729 words=1776418 pages=3470
Jun 17 23:56:06 an-a04n02 kernel: block drbd0: size = 434 GB (454762916 KB)
Jun 17 23:56:06 an-a04n02 kernel: block drbd0: bitmap READ of 3470 pages took 8 jiffies Jun 17 23:56:06 an-a04n02 kernel: block drbd0: recounting of set bits took additional 17 jiffies Jun 17 23:56:06 an-a04n02 kernel: block drbd0: 0 KB (0 bits) marked out-of-sync by on disk bit-map. Jun 17 23:56:06 an-a04n02 kernel: block drbd0: disk( Attaching -> Consistent ) Jun 17 23:56:06 an-a04n02 kernel: block drbd0: attached to UUIDs C71081B1CBAFC620:0000000000000000:F9F9DA52F6D93991:F9F8DA52F6D93991 Jun 17 23:56:06 an-a04n02 kernel: block drbd0: conn( StandAlone -> Unconnected ) Jun 17 23:56:06 an-a04n02 kernel: block drbd0: Starting receiver thread (from drbd0_worker [7613])
Jun 17 23:56:06 an-a04n02 kernel: block drbd0: receiver (re)started
Jun 17 23:56:06 an-a04n02 kernel: block drbd0: conn( Unconnected -> WFConnection ) Jun 17 23:56:06 an-a04n02 attrd[7446]: notice: attrd_trigger_update: Sending flush op to all hosts for: master-drbd_r0 (5) Jun 17 23:56:06 an-a04n02 attrd[7446]: notice: attrd_perform_update: Sent update 8: master-drbd_r0=5 Jun 17 23:56:06 an-a04n02 crmd[7448]: notice: process_lrm_event: LRM operation drbd_r0_start_0 (call=27, rc=0, cib-update=15, confirmed=true) ok Jun 17 23:56:06 an-a04n02 attrd[7446]: notice: attrd_perform_update: Sent update 10: master-drbd_r0=5 Jun 17 23:56:06 an-a04n02 crmd[7448]: notice: process_lrm_event: LRM operation drbd_r0_notify_0 (call=34, rc=0, cib-update=0, confirmed=true) ok Jun 17 23:56:06 an-a04n02 crmd[7448]: notice: process_lrm_event: LRM operation drbd_r0_notify_0 (call=37, rc=0, cib-update=0, confirmed=true) ok Jun 17 23:56:06 an-a04n02 kernel: block drbd0: helper command: /sbin/drbdadm fence-peer minor-0 Jun 17 23:56:07 an-a04n02 kernel: block drbd0: Handshake successful: Agreed network protocol version 97 Jun 17 23:56:07 an-a04n02 stonith-ng[7444]: notice: can_fence_host_with_device: fence_n02_ipmi can fence an-a04n02.alteeve.ca: static-list Jun 17 23:56:07 an-a04n02 stonith-ng[7444]: notice: can_fence_host_with_device: fence_n01_ipmi can not fence an-a04n02.alteeve.ca: static-list Jun 17 23:56:07 an-a04n02 stonith_admin[7726]: notice: crm_log_args: Invoked: stonith_admin --fence an-a04n01.alteeve.ca Jun 17 23:56:07 an-a04n02 stonith-ng[7444]: notice: handle_request: Client stonith_admin.7726.0f660392 wants to fence (off) 'an-a04n01.alteeve.ca' with device '(any)' Jun 17 23:56:07 an-a04n02 stonith-ng[7444]: notice: initiate_remote_stonith_op: Initiating remote operation off for an-a04n01.alteeve.ca: fd2fafff-174a-4744-b83c-e762c88ed12b (0) Jun 17 23:56:07 an-a04n02 stonith-ng[7444]: notice: can_fence_host_with_device: fence_n02_ipmi can not fence an-a04n01.alteeve.ca: static-list Jun 17 23:56:07 an-a04n02 stonith-ng[7444]: notice: can_fence_host_with_device: fence_n01_ipmi can fence an-a04n01.alteeve.ca: static-list Jun 17 23:56:07 an-a04n02 stonith-ng[7444]: notice: can_fence_host_with_device: fence_n02_ipmi can not fence an-a04n01.alteeve.ca: static-list Jun 17 23:56:07 an-a04n02 stonith-ng[7444]: notice: can_fence_host_with_device: fence_n01_ipmi can fence an-a04n01.alteeve.ca: static-list Jun 17 23:56:08 an-a04n02 ntpd[2540]: 0.0.0.0 c612 02 freq_set kernel 16.841 PPM
Jun 17 23:56:08 an-a04n02 ntpd[2540]: 0.0.0.0 c615 05 clock_sync
====

Cluestick beatins welcomed...

--
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without access to education?

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Reply via email to