[Pacemaker] pacemaker error after a couple week or month

ariee Fri, 19 Dec 2014 06:20:08 -0800

Hello,

I have 2 active-passive fail over system with corosync and drbd.
One system using 2 debian server and the other using 2 ubuntu server.
The debian servers are for web server fail over and the ubuntu servers are
for database server fail over.


I applied the same configuration in the pacemaker. Everything works fine,
fail over can be done nicely and also the file system synchronization, but
in the ubuntu server, it was always has error after a couple week or month.
The pacemaker in ubuntu1 had different status with ubuntu2, ubuntu1 assumed
that ubuntu2 was down and ubuntu2 assumed that something happened with
ubuntu1 but still alive and took over the resources. It made the drbd
resource cannot be taken over, thus no fail over happened and we must
manually restart the server because restarting pacemaker and corosync
didn't help. I have changed the configuration of pacemaker a couple time,
but the problem still exist.

has anyone experienced it? I use Ubuntu 14.04.1 LTS.

I got this error in apport.log

ERROR: apport (pid 20361) Fri Dec 19 02:43:52 2014: executable:
/usr/lib/pacemaker/lrmd (command line "/usr/lib/pacemaker/lrmd")
ERROR: apport (pid 20361) Fri Dec 19 02:43:52 2014: is_closing_session():
no DBUS_SESSION_BUS_ADDRESS in environment
ERROR: apport (pid 20361) Fri Dec 19 02:43:52 2014: wrote report
/var/crash/_usr_lib_pacemaker_lrmd.0.crash

my pacemaker configuration:

node $id="1" db \
        attributes standby="off"
node $id="2" db2 \
        attributes standby="off"
primitive ClusterIP ocf:heartbeat:IPaddr2 \
        params ip="192.168.0.100" cidr_netmask="24" \
        op monitor interval="30s"
primitive DBase ocf:heartbeat:mysql \
        meta target-role="Started" \
        op start timeout="120s" interval="0" \
        op stop timeout="120s" interval="0" \
        op monitor interval="20s" timeout="30s"
primitive DbFS ocf:heartbeat:Filesystem \
        params device="/dev/drbd0" directory="/sync" fstype="ext4" \
        op start timeout="60s" interval="0" \
        op stop timeout="180s" interval="0" \
        op monitor interval="60s" timeout="60s"
primitive Links lsb:drbdlinks
primitive r0 ocf:linbit:drbd \
        params drbd_resource="r0" \
        op monitor interval="29s" role="Master" \
        op start timeout="240s" interval="0" \
        op stop timeout="180s" interval="0" \
        op promote timeout="180s" interval="0" \
        op demote timeout="180s" interval="0" \
        op monitor interval="30s" role="Slave"
group DbServer ClusterIP DbFS Links DBase
ms ms_r0 r0 \
        meta master-max="1" master-node-max="1" clone-max="2"
clone-node-max="1" notify="true" target-role="Master"
location prefer-db DbServer 50: db
colocation DbServer-with-ms_ro inf: DbServer ms_r0:Master
order DbServer-after-ms_ro inf: ms_r0:promote DbServer:start
property $id="cib-bootstrap-options" \
        dc-version="1.1.10-42f2063" \
        cluster-infrastructure="corosync" \
        expected-quorum-votes="2" \
        stonith-enabled="false" \
        no-quorum-policy="ignore" \
        last-lrm-refresh="1363370585"

my corosync config:

totem {
        version: 2
        token: 3000
        token_retransmits_before_loss_const: 10
        join: 60
        consensus: 3600
        vsftype: none
        max_messages: 20
        clear_node_high_bit: yes
        secauth: off
        threads: 0
        rrp_mode: none
        transport: udpu
        cluster_name: Dbcluster
}

nodelist {
  node {
        ring0_addr: db
        nodeid: 1
  }
  node {
        ring0_addr: db2
        nodeid: 2
  }
}

quorum {
  provider: corosync_votequorum
}

amf {
        mode: disabled
}

service {
        ver:       0
        name:      pacemaker
}

aisexec {
        user:   root
        group:  root
}

logging {
        fileline: off
        to_stderr: yes
        to_logfile: yes
          logfile: /var/log/corosync/corosync.log
        to_syslog: no
        syslog_facility: daemon
        debug: off
        timestamp: on
        logger_subsys {
                subsys: AMF
                debug: off
                tags: enter|leave|trace1|trace2|trace3|trace4|trace6
        }
}

my drbd.conf:

global {
        usage-count no;
}

common {
        protocol C;

        handlers {
                pri-on-incon-degr
"/usr/lib/drbd/notify-pri-on-incon-degr.sh;
/usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ;
reboot -f";
                pri-lost-after-sb
"/usr/lib/drbd/notify-pri-lost-after-sb.sh;
/usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ;
reboot -f";
                local-io-error "/usr/lib/drbd/notify-io-error.sh;
/usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ;
halt -f";
        }

        startup {
                degr-wfc-timeout 120;
        }

        disk {
                on-io-error detach;
        }

        syncer {
                rate 100M;
                al-extents 257;
        }
}

resource r0 {
        protocol C;
        flexible-meta-disk internal;

        on db2 {
                address 192.168.0.10:7801;
                device /dev/drbd0 minor 0;
                disk /dev/sdb1;
        }
        on db {
                device /dev/drbd0 minor 0;
                disk /dev/db/sync;
                address 192.168.0.20:7801;
        }
        handlers {
                split-brain "/usr/lib/drbd/notify-split-brain.sh root";
        }
        net {
                after-sb-0pri discard-younger-primary;
#discard-zero-changes;
                after-sb-1pri discard-secondary;
                after-sb-2pri call-pri-lost-after-sb;
        }
}

I have no idea, how to solve this problem. Maybe someone can help me.

best regards,

ariee

Dec 18 23:19:31 [1167] db corosync notice  [TOTEM ] A processor failed, forming 
new configuration.
Dec 18 23:19:32 [1167] db corosync notice  [TOTEM ] A new membership 
(ipubuntu1:1040) was formed. Members
Dec 18 23:19:35 [1167] db corosync notice  [TOTEM ] A processor failed, forming 
new configuration.
Dec 18 23:19:39 [1167] db corosync notice  [TOTEM ] A new membership 
(ipubuntu1:1044) was formed. Members left: 2
Dec 18 23:19:39 [1289] db        cib:     info: pcmk_cpg_membership:    
Left[1.0] cib.2
Dec 18 23:19:39 [1289] db        cib:     info: crm_update_peer_proc:   
pcmk_cpg_membership: Node db2[2] - corosync-cpg is now offline
Dec 18 23:19:39 [1294] db       crmd:     info: pcmk_cpg_membership:    
Left[1.0] crmd.2
Dec 18 23:19:39 [1289] db        cib:     info: pcmk_cpg_membership:    
Member[1.0] cib.1
Dec 18 23:19:39 [1294] db       crmd:     info: crm_update_peer_proc:   
pcmk_cpg_membership: Node db2[2] - corosync-cpg is now offline
Dec 18 23:19:39 [1294] db       crmd:     info: peer_update_callback:   Client 
db2/peer now has status [offline] (DC=db2)
Dec 18 23:19:39 [1287] db pacemakerd:     info: crm_cs_flush:   Sent 0 CPG 
messages  (1 remaining, last=7): Try again (6)
Dec 18 23:19:39 [1294] db       crmd:   notice: peer_update_callback:   Our 
peer on the DC is dead
Dec 18 23:19:39 [1290] db stonith-ng:     info: pcmk_cpg_membership:    
Left[1.0] stonith-ng.2
Dec 18 23:19:39 [1294] db       crmd:     info: pcmk_cpg_membership:    
Member[1.0] crmd.1
Dec 18 23:19:39 [1290] db stonith-ng:     info: crm_update_peer_proc:   
pcmk_cpg_membership: Node db2[2] - corosync-cpg is now offline
Dec 18 23:19:39 [1294] db       crmd:   notice: do_state_transition:    State 
transition S_NOT_DC -> S_ELECTION [ input=I_ELECTION 
cause=C_CRMD_STATUS_CALLBACK origin=peer_update_callback ]
Dec 18 23:19:39 [1294] db       crmd:     info: update_dc:      Unset DC. Was 
db2
Dec 18 23:19:39 [1290] db stonith-ng:     info: crm_cs_flush:   Sent 0 CPG 
messages  (1 remaining, last=2): Try again (6)
Dec 18 23:19:39 [1290] db stonith-ng:     info: pcmk_cpg_membership:    
Member[1.0] stonith-ng.1
Dec 18 23:19:39 [1294] db       crmd:     info: crm_cs_flush:   Sent 0 CPG 
messages  (1 remaining, last=14): Try again (6)
Dec 18 23:19:39 [1167] db corosync notice  [QUORUM] This node is within the 
non-primary component and will NOT provide any services.
Dec 18 23:19:39 [1287] db pacemakerd:   notice: pcmk_quorum_notification:       
Membership 1044: quorum lost (1)
Dec 18 23:19:39 [1294] db       crmd:   notice: pcmk_quorum_notification:       
Membership 1044: quorum lost (1)
Dec 18 23:19:39 [1167] db corosync notice  [QUORUM] Members[1]: 1
Dec 18 23:19:39 [1167] db corosync notice  [MAIN  ] Completed service 
synchronization, ready to provide service.
Dec 18 23:19:39 [1294] db       crmd:   notice: crm_update_peer_state:  
pcmk_quorum_notification: Node db2[2] - state is now lost (was member)
Dec 18 23:19:39 [1287] db pacemakerd:   notice: crm_update_peer_state:  
pcmk_quorum_notification: Node db2[2] - state is now lost (was member)
Dec 18 23:19:39 [1294] db       crmd:     info: peer_update_callback:   db2 is 
now lost (was member)
Dec 18 23:19:39 [1294] db       crmd:     info: crm_cs_flush:   Sent 2 CPG 
messages  (0 remaining, last=16): OK (1)
Dec 18 23:19:39 [1294] db       crmd:     info: do_log:         FSA: Input 
I_ELECTION_DC from do_election_check() received in state S_ELECTION
Dec 18 23:19:39 [1294] db       crmd:   notice: do_state_transition:    State 
transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC 
cause=C_FSA_INTERNAL origin=do_election_check ]
Dec 18 23:19:39 [1294] db       crmd:     info: do_te_control:  Registering TE 
UUID: 3f9f2c50-dc16-44bf-94cd-30e05a0e1df6
Dec 18 23:19:39 [1294] db       crmd:     info: set_graph_functions:    Setting 
custom graph functions
Dec 18 23:19:39 [1293] db    pengine:     info: crm_client_new:         
Connecting 0x7f53a8043710 for uid=109 gid=118 pid=1294 
id=09b157e6-3fdd-4a6d-8b51-0449cb81d6d6
Dec 18 23:19:39 [1294] db       crmd:     info: do_dc_takeover:         Taking 
over DC status for this partition
Dec 18 23:19:39 [1289] db        cib:     info: cib_process_readwrite:  We are 
now in R/W mode
Dec 18 23:19:39 [1289] db        cib:     info: cib_process_request:    
Completed cib_master operation for section 'all': OK (rc=0, 
origin=local/crmd/28, version=0.106.18)
Dec 18 23:19:39 [1289] db        cib:     info: cib_process_request:    
Completed cib_modify operation for section cib: OK (rc=0, origin=local/crmd/29, 
version=0.106.18)
Dec 18 23:19:39 [1289] db        cib:     info: cib_process_request:    
Completed cib_query operation for section 
//cib/configuration/crm_config//cluster_property_set//nvpair[@name='dc-version']:
 OK (rc=0, o
rigin=local/crmd/30, version=0.106.18)
Dec 18 23:19:39 [1289] db        cib:     info: cib_process_request:    
Completed cib_modify operation for section crm_config: OK (rc=0, 
origin=local/crmd/31, version=0.106.18)
Dec 18 23:19:39 [1289] db        cib:     info: cib_process_request:    
Completed cib_query operation for section 
//cib/configuration/crm_config//cluster_property_set//nvpair[@name='cluster-infrastructure']:
 OK (rc=0, origin=local/crmd/32, version=0.106.18)
Dec 18 23:19:39 [1294] db       crmd:     info: join_make_offer:        Making 
join offers based on membership 1044
Dec 18 23:19:39 [1294] db       crmd:     info: join_make_offer:        join-1: 
Sending offer to db
Dec 18 23:19:39 [1294] db       crmd:     info: crm_update_peer_join:   
join_make_offer: Node db[1] - join-1 phase 0 -> 1
Dec 18 23:19:39 [1294] db       crmd:     info: do_dc_join_offer_all:   join-1: 
Waiting on 1 outstanding join acks
Dec 18 23:19:39 [1294] db       crmd:     info: update_dc:      Set DC to db 
(3.0.7)
Dec 18 23:19:39 [1289] db        cib:     info: cib_process_request:    
Completed cib_modify operation for section crm_config: OK (rc=0, 
origin=local/crmd/33, version=0.106.18)
Dec 18 23:19:39 [1289] db        cib:     info: cib_process_request:    
Completed cib_query operation for section 'all': OK (rc=0, 
origin=local/crmd/35, version=0.106.18)
Dec 18 23:19:39 [1294] db       crmd:     info: crm_update_peer_join:   
do_dc_join_filter_offer: Node db[1] - join-1 phase 1 -> 2
Dec 18 23:19:39 [1294] db       crmd:     info: crm_update_peer_expected:       
do_dc_join_filter_offer: Node db[1] - expected state is now member
Dec 18 23:19:39 [1294] db       crmd:     info: do_state_transition:    State 
transition S_INTEGRATION -> S_FINALIZE_JOIN [ input=I_INTEGRATED 
cause=C_FSA_INTERNAL origin=check_join_state ]
Dec 18 23:19:39 [1294] db       crmd:     info: crmd_join_phase_log:    join-1: 
db2=none
Dec 18 23:19:39 [1294] db       crmd:     info: crmd_join_phase_log:    join-1: 
db=integrated
Dec 18 23:19:39 [1294] db       crmd:     info: do_dc_join_finalize:    join-1: 
Syncing our CIB to the rest of the cluster
Dec 18 23:19:39 [1289] db        cib:     info: cib_process_request:    
Completed cib_sync operation for section 'all': OK (rc=0, origin=local/crmd/36, 
version=0.106.18)
Dec 18 23:19:39 [1294] db       crmd:     info: crm_update_peer_join:   
finalize_join_for: Node db[1] - join-1 phase 2 -> 3
Dec 18 23:19:39 [1289] db        cib:     info: cib_process_request:    
Completed cib_modify operation for section nodes: OK (rc=0, 
origin=local/crmd/37, version=0.106.18)
Dec 18 23:19:39 [1294] db       crmd:     info: services_os_action_execute:     
Managed Filesystem_meta-data_0 process 8975 exited with rc=0
Dec 18 23:19:39 [1294] db       crmd:     info: services_os_action_execute:     
Managed IPaddr2_meta-data_0 process 8980 exited with rc=0
Dec 18 23:19:39 [1294] db       crmd:     info: services_os_action_execute:     
Managed mysql_meta-data_0 process 8984 exited with rc=0
Dec 18 23:19:39 [1294] db       crmd:     info: crm_update_peer_join:   
do_dc_join_ack: Node db[1] - join-1 phase 3 -> 4
Dec 18 23:19:39 [1294] db       crmd:     info: do_dc_join_ack:         join-1: 
Updating node state to member for db
Dec 18 23:19:39 [1294] db       crmd:     info: erase_status_tag:       
Deleting xpath: //node_state[@uname='db']/lrm
Dec 18 23:19:39 [1289] db        cib:     info: cib_process_request:    
Completed cib_delete operation for section //node_state[@uname='db']/lrm: OK 
(rc=0, origin=local/crmd/38, version=0.106.19)
Dec 18 23:19:39 [1289] db        cib:     info: cib_process_request:    
Completed cib_modify operation for section status: OK (rc=0, 
origin=local/crmd/39, version=0.106.20)
Dec 18 23:19:39 [1294] db       crmd:     info: do_state_transition:    State 
transition S_FINALIZE_JOIN -> S_POLICY_ENGINE [ input=I_FINALIZED 
cause=C_FSA_INTERNAL origin=check_join_state ]
Dec 18 23:19:39 [1294] db       crmd:     info: abort_transition_graph:         
do_te_invoke:151 - Triggered transition abort (complete=1) : Peer Cancelled
Dec 18 23:19:39 [1292] db      attrd:   notice: attrd_local_callback:   Sending 
full refresh (origin=crmd)
Dec 18 23:19:39 [1292] db      attrd:   notice: attrd_trigger_update:   Sending 
flush op to all hosts for: master-r0 (10000)
Dec 18 23:19:39 [1289] db        cib:     info: cib_process_request:    
Completed cib_modify operation for section nodes: OK (rc=0, 
origin=local/crmd/40, version=0.106.20)
Dec 18 23:19:39 [1289] db        cib:     info: cib_process_request:    
Completed cib_modify operation for section status: OK (rc=0, 
origin=local/crmd/41, version=0.106.21)
Dec 18 23:19:39 [1289] db        cib:     info: cib_process_request:    
Completed cib_modify operation for section cib: OK (rc=0, origin=local/crmd/42, 
version=0.106.22)
Dec 18 23:19:39 [1289] db        cib:     info: cib_process_request:    
Completed cib_query operation for section 'all': OK (rc=0, 
origin=local/crmd/43, version=0.106.22)
Dec 18 23:19:39 [1289] db        cib:     info: cib_process_request:    
Completed cib_query operation for section 
//cib/status//node_state[@id='1']//transient_attributes//nvpair[@name='master-r0']:
 OK (rc=0, origin=local/attrd/12, version=0.106.22)
Dec 18 23:19:39 [1292] db      attrd:   notice: attrd_trigger_update:   Sending 
flush op to all hosts for: probe_complete (true)
Dec 18 23:19:39 [1289] db        cib:     info: cib_process_request:    
Completed cib_modify operation for section status: OK (rc=0, 
origin=local/attrd/13, version=0.106.22)
Dec 18 23:19:39 [1289] db        cib:     info: cib_process_request:    
Completed cib_query operation for section 
//cib/status//node_state[@id='1']//transient_attributes//nvpair[@name='probe_complete']:
 OK (rc=0, origin=local/attrd/14, version=0.106.22)
Dec 18 23:19:39 [1289] db        cib:     info: cib_process_request:    
Completed cib_modify operation for section status: OK (rc=0, 
origin=local/attrd/15, version=0.106.22)
Dec 18 23:19:40 [1289] db        cib:     info: cib_process_request:    
Completed cib_query operation for section 'all': OK (rc=0, 
origin=local/crmd/44, version=0.106.22)
Dec 18 23:19:40 [1293] db    pengine:   notice: unpack_config:  On loss of CCM 
Quorum: Ignore
Dec 18 23:19:40 [1293] db    pengine:     info: determine_online_status:        
Node db is online
Dec 18 23:19:40 [1293] db    pengine:     info: clone_print:     Master/Slave 
Set: ms_r0 [r0]
Dec 18 23:19:40 [1293] db    pengine:     info: short_print:         Masters: [ 
db ]
Dec 18 23:19:40 [1293] db    pengine:     info: short_print:         Stopped: [ 
db2 ]
Dec 18 23:19:40 [1293] db    pengine:     info: group_print:     Resource 
Group: DbServer
Dec 18 23:19:40 [1293] db    pengine:     info: native_print:        ClusterIP  
(ocf::heartbeat:IPaddr2):       Started db
Dec 18 23:19:40 [1293] db    pengine:     info: native_print:        DbFS       
(ocf::heartbeat:Filesystem):    Started db
Dec 18 23:19:40 [1293] db    pengine:     info: native_print:        Links      
(lsb:drbdlinks):        Started db
Dec 18 23:19:40 [1293] db    pengine:     info: native_print:        DBase      
(ocf::heartbeat:mysql): Started db
Dec 18 23:19:40 [1293] db    pengine:     info: native_color:   Resource r0:1 
cannot run anywhere
Dec 18 23:19:40 [1293] db    pengine:     info: master_color:   Promoting r0:0 
(Master db)
Dec 18 23:19:40 [1293] db    pengine:     info: master_color:   ms_r0: Promoted 
1 instances of a possible 1 to master
Dec 18 23:19:40 [1293] db    pengine:     info: LogActions:     Leave   r0:0    
(Master db)
Dec 18 23:19:40 [1293] db    pengine:     info: LogActions:     Leave   r0:1    
(Stopped)
Dec 18 23:19:40 [1293] db    pengine:     info: LogActions:     Leave   
ClusterIP       (Started db)
Dec 18 23:19:40 [1293] db    pengine:     info: LogActions:     Leave   DbFS    
(Started db)
Dec 18 23:19:40 [1293] db    pengine:     info: LogActions:     Leave   Links   
(Started db)
Dec 18 23:19:40 [1293] db    pengine:     info: LogActions:     Leave   DBase   
(Started db)
Dec 18 23:19:40 [1293] db    pengine:   notice: process_pe_message:     
Calculated Transition 0: /var/lib/pacemaker/pengine/pe-input-1202.bz2
Dec 18 23:19:40 [1294] db       crmd:     info: do_state_transition:    State 
transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS 
cause=C_IPC_MESSAGE origin=handle_response ]
Dec 18 23:19:40 [1294] db       crmd:     info: do_te_invoke:   Processing 
graph 0 (ref=pe_calc-dc-1418941180-20) derived from 
/var/lib/pacemaker/pengine/pe-input-1202.bz2
Dec 18 23:19:40 [1294] db       crmd:   notice: run_graph:      Transition 0 
(Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, 
Source=/var/lib/pacemaker/pengine/pe-input-1202.bz2): Complete
Dec 18 23:19:40 [1294] db       crmd:     info: do_log:         FSA: Input 
I_TE_SUCCESS from notify_crmd() received in state S_TRANSITION_ENGINE
Dec 18 23:19:40 [1294] db       crmd:   notice: do_state_transition:    State 
transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS 
cause=C_FSA_INTERNAL origin=notify_crmd ]
Dec 18 23:19:57 [1289] db        cib:     info: crm_client_new:         
Connecting 0x7ffbf6cf2f40 for uid=0 gid=0 pid=9069 
id=c56388b9-8031-4304-be22-0b1462c1c814
Dec 18 23:19:57 [1289] db        cib:     info: cib_process_request:    
Completed cib_query operation for section nodes: OK (rc=0, 
origin=local/crm_attribute/2, version=0.106.22)
Dec 18 23:19:57 [1289] db        cib:     info: crm_client_destroy:     
Destroying 0 events

Dec 18 23:19:29 [1011] db2 corosync warning [MAIN  ] Corosync main process was 
not scheduled for 2856.1863 ms (threshold is 2400.0000 ms). Consider token 
timeout increase.
Dec 18 23:19:31 [1011] db2 corosync notice  [TOTEM ] A processor failed, 
forming new configuration.
Dec 18 23:19:31 [1011] db2 corosync notice  [TOTEM ] A new membership 
(ipubuntu1:1040) was formed. Members
Dec 18 23:19:53 [1106] db2       lrmd:  warning: child_timeout_callback:      
r0_monitor_30000 process (PID 23890) timed out
Dec 18 23:19:53 [1106] db2       lrmd:  warning: operation_finished:  
r0_monitor_30000:23890 - timed out after 20000ms
Dec 18 23:19:53 [1109] db2       crmd:    error: process_lrm_event:   LRM 
operation r0_monitor_30000 (212) Timed Out (timeout=20000ms)
Dec 18 23:19:53 [1104] db2        cib:     info: cib_process_request:         
Completed cib_modify operation for section status: OK (rc=0, 
origin=local/crmd/3152, version=0.106.19)
Dec 18 23:19:53 [1104] db2        cib:     info: crm_cs_flush:        Sent 0 
CPG messages  (1 remaining, last=134): Try again (6)
Dec 18 23:19:53 [1109] db2       crmd:     info: abort_transition_graph:      
process_graph_event:520 - Triggered transition abort (complete=1, node=db2, 
tag=lrm_rsc_op, id=r0_last_failure_0, 
magic=2:1;10:13:0:c702936a-46bc-42de-8439-3694fb05b2b1, cib=0.106.19) : Old 
event
Dec 18 23:19:53 [1109] db2       crmd:  warning: update_failcount:    Updating 
failcount for r0 on db2 after failed monitor: rc=1 (update=value++, 
time=1418941193)
Dec 18 23:19:53 [1109] db2       crmd:     info: process_graph_event:         
Detected action (13.10) r0_monitor_30000.212=unknown error: failed
Dec 18 23:19:53 [1109] db2       crmd:   notice: do_state_transition:         
State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC 
cause=C_FSA_INTERNAL origin=abort_transition_graph ]
Dec 18 23:19:53 [1107] db2      attrd:   notice: attrd_trigger_update:        
Sending flush op to all hosts for: fail-count-r0 (1)
Dec 18 23:19:53 [1104] db2        cib:     info: cib_process_request:         
Completed cib_query operation for section 'all': OK (rc=0, 
origin=local/crmd/3153, version=0.106.19)
Dec 18 23:19:53 [1108] db2    pengine:   notice: unpack_config:       On loss 
of CCM Quorum: Ignore
Dec 18 23:19:53 [1108] db2    pengine:     info: determine_online_status:     
Node db2 is online
Dec 18 23:19:53 [1108] db2    pengine:     info: determine_online_status:     
Node db is online
Dec 18 23:19:53 [1108] db2    pengine:  warning: unpack_rsc_op:       
Processing failed op monitor for r0:0 on db2: unknown error (1)
Dec 18 23:19:53 [1108] db2    pengine:     info: clone_print:          
Master/Slave Set: ms_r0 [r0]
Dec 18 23:19:53 [1108] db2    pengine:     info: native_print:             r0 
(ocf::linbit:drbd):     Slave db2 FAILED
Dec 18 23:19:53 [1108] db2    pengine:     info: short_print:              
Masters: [ db ]
Dec 18 23:19:53 [1108] db2    pengine:     info: group_print:          Resource 
Group: DbServer
Dec 18 23:19:53 [1108] db2    pengine:     info: native_print:             
ClusterIP  (ocf::heartbeat:IPaddr2):       Started db
Dec 18 23:19:53 [1108] db2    pengine:     info: native_print:             DbFS 
      (ocf::heartbeat:Filesystem):    Started db
Dec 18 23:19:53 [1108] db2    pengine:     info: native_print:             
Links      (lsb:drbdlinks):        Started db
Dec 18 23:19:53 [1108] db2    pengine:     info: native_print:             
DBase      (ocf::heartbeat:mysql): Started db
Dec 18 23:19:53 [1108] db2    pengine:     info: master_color:        Promoting 
r0:1 (Master db)
Dec 18 23:19:53 [1108] db2    pengine:     info: master_color:        ms_r0: 
Promoted 1 instances of a possible 1 to master
Dec 18 23:19:53 [1108] db2    pengine:     info: RecurringOp:          Start 
recurring monitor (30s) for r0:0 on db2
Dec 18 23:19:53 [1108] db2    pengine:     info: RecurringOp:          Start 
recurring monitor (30s) for r0:0 on db2
Dec 18 23:19:53 [1108] db2    pengine:   notice: LogActions:  Recover r0:0    
(Slave db2)
Dec 18 23:19:53 [1108] db2    pengine:     info: LogActions:  Leave   r0:1    
(Master db)
Dec 18 23:19:53 [1108] db2    pengine:     info: LogActions:  Leave   ClusterIP 
      (Started db)
Dec 18 23:19:53 [1108] db2    pengine:     info: LogActions:  Leave   DbFS    
(Started db)
Dec 18 23:19:53 [1108] db2    pengine:     info: LogActions:  Leave   Links   
(Started db)
Dec 18 23:19:53 [1108] db2    pengine:     info: LogActions:  Leave   DBase   
(Started db)
Dec 18 23:19:53 [1109] db2       crmd:     info: do_state_transition:         
State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS 
cause=C_IPC_MESSAGE origin=handle_response ]
Dec 18 23:19:53 [1109] db2       crmd:     info: do_te_invoke:        
Processing graph 3033 (ref=pe_calc-dc-1418941193-3212) derived from 
/var/lib/pacemaker/pengine/pe-input-643.bz2
Dec 18 23:19:53 [1108] db2    pengine:   notice: process_pe_message:  
Calculated Transition 3033: /var/lib/pacemaker/pengine/pe-input-643.bz2
Dec 18 23:19:53 [1109] db2       crmd:   notice: te_rsc_command:      
Initiating action 58: notify r0_pre_notify_stop_0 on db2 (local)
Dec 18 23:19:53 [1109] db2       crmd:     info: do_lrm_rsc_op:       
Performing key=58:3033:0:c702936a-46bc-42de-8439-3694fb05b2b1 op=r0_notify_0
Dec 18 23:19:53 [1106] db2       lrmd:     info: log_execute:         executing 
- rsc:r0 action:notify call_id:216
Dec 18 23:19:53 [1104] db2        cib:     info: cib_process_request:         
Completed cib_query operation for section 
//cib/status//node_state[@id='2']//transient_attributes//nvpair[@name='fail-count-r0']:
 No such device or address (rc=-6, origin=local/attrd/45, version=0.106.19)
Dec 18 23:19:53 [1104] db2        cib:     info: cib_process_request:         
Completed cib_query operation for section /cib: OK (rc=0, 
origin=local/attrd/46, version=0.106.19)
Dec 18 23:19:53 [1107] db2      attrd:   notice: attrd_perform_update:        
Sent update 47: fail-count-r0=1
Dec 18 23:19:53 [1107] db2      attrd:   notice: attrd_trigger_update:        
Sending flush op to all hosts for: last-failure-r0 (1418941193)
Dec 18 23:19:53 [1104] db2        cib:     info: cib_process_request:         
Completed cib_modify operation for section status: OK (rc=0, 
origin=local/attrd/47, version=0.106.20)
Dec 18 23:19:53 [1104] db2        cib:     info: cib_process_request:         
Completed cib_query operation for section 
//cib/status//node_state[@id='2']//transient_attributes//nvpair[@name='last-failure-r0']:
 No such device or address (rc=-6, origin=local/attrd/48, version=0.106.20)
Dec 18 23:19:53 [1109] db2       crmd:   notice: te_rsc_command:      
Initiating action 59: notify r0_pre_notify_stop_0 on db
Dec 18 23:19:53 [1109] db2       crmd:     info: crm_cs_flush:        Sent 0 
CPG messages  (1 remaining, last=99): Try again (6)
Dec 18 23:19:53 [1109] db2       crmd:     info: abort_transition_graph:      
te_update_diff:172 - Triggered transition abort (complete=0, node=db2, 
tag=nvpair, id=status-2-fail-count-r0, name=fail-count-r0, value=1, magic=NA, 
cib=0.106.20) : Transient attribute: update
Dec 18 23:19:53 [1104] db2        cib:     info: cib_process_request:         
Completed cib_query operation for section /cib: OK (rc=0, 
origin=local/attrd/49, version=0.106.20)
Dec 18 23:19:53 [1107] db2      attrd:   notice: attrd_perform_update:        
Sent update 50: last-failure-r0=1418941193
Dec 18 23:19:53 [1109] db2       crmd:     info: abort_transition_graph:      
te_update_diff:172 - Triggered transition abort (complete=0, node=db2, 
tag=nvpair, id=status-2-last-failure-r0, name=last-failure-r0, 
value=1418941193, magic=NA, cib=0.106.21) : Transient attribute: update
Dec 18 23:19:53 [1104] db2        cib:     info: cib_process_request:         
Completed cib_modify operation for section status: OK (rc=0, 
origin=local/attrd/50, version=0.106.21)
Dec 18 23:19:53 [1106] db2       lrmd:     info: log_finished:        finished 
- rsc:r0 action:notify call_id:216 pid:23916 exit-code:0 exec-time:98ms 
queue-time:0ms
Dec 18 23:19:53 [1109] db2       crmd:     info: match_graph_event:   Action 
r0_notify_0 (58) confirmed on db2 (rc=0)
Dec 18 23:19:53 [1109] db2       crmd:   notice: process_lrm_event:   LRM 
operation r0_notify_0 (call=216, rc=0, cib-update=0, confirmed=true) ok
Dec 18 23:19:53 [1104] db2        cib:     info: crm_cs_flush:        Sent 0 
CPG messages  (3 remaining, last=134): Try again (6)
Dec 18 23:19:53 [1109] db2       crmd:     info: crm_cs_flush:        Sent 0 
CPG messages  (1 remaining, last=99): Try again (6)

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[Pacemaker] pacemaker error after a couple week or month

Reply via email to