On 29 May 2014, at 12:28 pm, Yusuke Iida <yusk.i...@gmail.com> wrote:
> Hi, Andrew > > I'm sorry. > It seems that the notation of the node name became another by syslog. > In order to dispel misunderstanding, the report was newly acquired. > I think that the signs are appearing in vm02/ha-log. Got it :) Ok, step 1 - stop logging debug. Debug is accounting for 30% of the logs and all that writing to disk would be adding significantly to the cluster's workload. Question: How have you got logging configured? Anything in /etc/sysconfig/pacemaker ? I ask because pacemaker.log appears to have a jumble of syslog and regular file output: May 29 10:45:26 vm02 cib[25603]: info: cib_perform_op: + /cib: @num_updates=1295 May 29 10:45:26 [25603] vm02 cib: info: cib_perform_op: + /cib: @num_updates=1295 Step 2 - can you try this patch: diff --git a/crmd/te_callbacks.c b/crmd/te_callbacks.c index 4d330a6..eba5f11 100644 --- a/crmd/te_callbacks.c +++ b/crmd/te_callbacks.c @@ -381,12 +381,15 @@ te_update_diff(const char *event, xmlNode * msg) } else if(strstr(xpath, "/cib/configuration")) { abort_transition(INFINITY, tg_restart, "Non-status change", change); + break; /* Wont be packaged with any resource operations we may be waiting for */ } else if(strstr(xpath, "/"XML_CIB_TAG_TICKETS) || safe_str_eq(name, XML_CIB_TAG_TICKETS)) { abort_transition(INFINITY, tg_restart, "Ticket attribute change", change); + break; /* Wont be packaged with any resource operations we may be waiting for */ } else if(strstr(xpath, "/"XML_TAG_TRANSIENT_NODEATTRS"[") || safe_str_eq(name, XML_TAG_TRANSIENT_NODEATTRS)) { abort_transition(INFINITY, tg_restart, "Transient attribute change", change); + break; /* Wont be packaged with any resource operations we may be waiting for */ } else if(strstr(xpath, "/"XML_LRM_TAG_RSC_OP"[") && safe_str_eq(op, "delete")) { crm_action_t *cancel = NULL; > > May 29 10:43:37 vm02 crmd[25608]: error: config_query_callback: > Local CIB query resulted in an error: Timer expired > May 29 10:43:37 vm02 crmd[25608]: info: register_fsa_error_adv: > Resetting the current action list > May 29 10:43:37 vm02 crmd[25608]: error: do_log: FSA: Input I_ERROR > from config_query_callback() received in state S_POLICY_ENGINE > May 29 10:43:37 vm02 crmd[25608]: warning: do_state_transition: State > transition S_POLICY_ENGINE -> S_RECOVERY [ input=I_ERROR > cause=C_FSA_INTERNAL origin=config_query_callback ] > May 29 10:43:37 vm02 crmd[25608]: warning: do_recover: Fast-tracking > shutdown in response to errors > May 29 10:43:37 vm02 crmd[25608]: warning: do_election_vote: Not > voting in election, we're in state S_RECOVERY > > https://drive.google.com/file/d/0BwMFJItoO-fVSEd2MkRiOGxkelk/edit?usp=sharing > > Regards, > Yusuke > > 2014-05-29 10:26 GMT+09:00 Andrew Beekhof <and...@beekhof.net>: >> >> On 28 May 2014, at 6:42 pm, Yusuke Iida <yusk.i...@gmail.com> wrote: >> >>> Hi, Andrew >>> >>> I made the cluster load a setup to which 256 resources are started using >>> crmsh. >>> At this time, crmd changed into the S_RECOVERY state and rebooted. >>> >>> May 28 17:08:00 [14194] vm02 crmd: error: >>> config_query_callback: Local CIB query resulted in an error: Timer >>> expired >>> May 28 17:08:00 [14194] vm02 crmd: info: >>> register_fsa_error_adv: Resetting the current action list >>> May 28 17:08:00 [14194] vm02 crmd: error: do_log: FSA: Input >>> I_ERROR from config_query_callback() received in state S_POLICY_ENGINE >>> May 28 17:08:00 [14194] vm02 crmd: warning: >>> do_state_transition: State transition S_POLICY_ENGINE -> S_RECOVERY [ >>> input=I_ERROR cause=C_FSA_INTERNAL origin=config_query_callback ] >>> May 28 17:08:00 [14194] vm02 crmd: warning: do_recover: >>> Fast-tracking shutdown in response to errors >>> May 28 17:08:00 [14194] vm02 crmd: warning: do_election_vote: >>> Not voting in election, we're in state S_RECOVERY >>> >>> I think that query performed in large quantities cannot be processed. >>> Before implementing cib_performance, abort_transition() was called only >>> once. >>> >>> Is this corrected? >>> >>> report when a problem occurs is attached. >>> https://drive.google.com/file/d/0BwMFJItoO-fVX0gxM1ptcE52WWs/edit?usp=sharing >> >> That doesn't appear to match the symptoms above. >> >>> >>> Regards, >>> Yusuke >>> -- >>> ---------------------------------------- >>> METRO SYSTEMS CO., LTD >>> >>> Yusuke Iida >>> Mail: yusk.i...@gmail.com >>> ---------------------------------------- >>> >>> _______________________________________________ >>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>> >>> Project Home: http://www.clusterlabs.org >>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> Bugs: http://bugs.clusterlabs.org >> >> >> _______________________________________________ >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org >> > > > > -- > ---------------------------------------- > METRO SYSTEMS CO., LTD > > Yusuke Iida > Mail: yusk.i...@gmail.com > ---------------------------------------- > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org
signature.asc
Description: Message signed with OpenPGP using GPGMail
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org