On 21 Mar 2014, at 3:57 am, Drapeau, Mathieu <mathieu.drap...@intel.com> wrote:
> Hello, > From pacemaker 1.1.8-7 from EL6, crmd died unexpected generating this logs > during a failover: Please update to 1.1.10 from the EL6 update channels: http://blog.clusterlabs.org/blog/2014/potential-for-data-corruption-in-pacemaker-1-dot-1-6-through-1-dot-1-9/ > > > crmd[10419]: error: crmd_node_update_complete: Node update 79 failed: > Timer expired (-62) It looks like your hardware is overloaded and an operation that shouldn't have taken very long has timed out. > crmd[10419]: error: do_log: FSA: Input I_ERROR from > crmd_node_update_complete() received in state S_IDLE > crmd[10419]: notice: do_state_transition: State transition S_IDLE -> > S_RECOVERY [ input=I_ERROR cause=C_FSA_INTERNAL > origin=crmd_node_update_complete ] > crmd[10419]: warning: do_recover: Fast-tracking shutdown in response to > errors > crmd[10419]: warning: do_election_vote: Not voting in election, we're in > state S_RECOVERY > crmd[10419]: error: do_log: FSA: Input I_TERMINATE from do_recover() > received in state S_RECOVERY > crmd[10419]: notice: lrm_state_verify_stopped: Stopped 0 recurring > operations at shutdown (2 ops remaining) > crmd[10419]: notice: lrm_state_verify_stopped: Recurring action > testfs-MDT0000_6cda68:21 (testfs-MDT0000_6cda68_monitor_5000) incomplete at > shutdown > crmd[10419]: notice: lrm_state_verify_stopped: Recurring action > MGS_f055b7:30 (MGS_f055b7_monitor_5000) incomplete at shutdown > crmd[10419]: error: lrm_state_verify_stopped: 3 resources were active at > shutdown. > crmd[10419]: notice: do_lrm_control: Disconnected from the LRM > crmd[10419]: notice: terminate_cs_connection: Disconnecting from Corosync > corosync[10370]: [pcmk ] info: pcmk_ipc_exit: Client crmd (conn=0x2589f40, > async-conn=0x2589f40) left > crmd[10419]: error: crmd_fast_exit: Could not recover from internal error > pacemakerd[10408]: error: pcmk_child_exit: Child process crmd (10419) > exited: Generic Pacemaker error (201) > pacemakerd[10408]: notice: pcmk_process_exit: Respawning failed child > process: crmd > > What could have happened and how to avoid crmd to die? > > Thanks, > Mat > > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org
signature.asc
Description: Message signed with OpenPGP using GPGMail
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org