On 16/05/2013, at 9:31 PM, Халезов Иван <i.khale...@rts.ru> wrote:
> On 16.05.2013 07:14, Andrew Beekhof wrote: >> On 15/05/2013, at 9:53 PM, Халезов Иван <i.khale...@rts.ru> wrote: >> >>> Hello everyone! >>> >>> Some problems occured with synchronisation CIB configuration to disk. >>> I have this errors in pacemaker's logfile: >> What were the messages before this? >> Did it happen once or many times? >> At startup or while the cluster was running? > > I had updated cluster configuration before, so there was some output about it > in the logfile (not from the beginning here, because it is rather big): I'm guessing some whitespace crept into the configuration. We've had problems with that in the past, https://github.com/beekhof/pacemaker/commit/c2550cbd33a3b2ab7efcd6ef516ba124fbae9a81 is one patch that you dont have for example. > > May 14 13:29:13 iblade6 cib[2848]: info: cib:diff: - <primitive > id="Security_A" > > May 14 13:29:13 iblade6 cib[2848]: info: cib:diff: - <meta_attributes > id="Security_A-meta_attributes" > > May 14 13:29:13 iblade6 cib[2848]: info: cib:diff: - <nvpair > id="Security_A-meta_attributes-target-role" name="target-role" > value="Stopped" __crm_diff_marker__="r > emoved:top" /> > May 14 13:29:13 iblade6 cib[2848]: info: cib:diff: - </meta_attributes> > May 14 13:29:13 iblade6 cib[2848]: info: cib:diff: - </primitive> > May 14 13:29:13 iblade6 cib[2848]: info: cib:diff: - <primitive > id="Security_B" > > May 14 13:29:13 iblade6 cib[2848]: info: cib:diff: - <meta_attributes > id="SPBEX_Security_B-meta_attributes" > > May 14 13:29:13 iblade6 cib[2848]: info: cib:diff: - <nvpair > id="Security_B-meta_attributes-target-role" name="target-role" > value="Started" __crm_diff_marker__="removed:top" /> > May 14 13:29:13 iblade6 cib[2848]: info: cib:diff: - </meta_attributes> > May 14 13:29:13 iblade6 cib[2848]: info: cib:diff: - </primitive> > May 14 13:29:13 iblade6 cib[2848]: info: cib:diff: - </group> > May 14 13:29:13 iblade6 cib[2848]: info: cib:diff: - </resources> > May 14 13:29:13 iblade6 cib[2848]: info: cib:diff: - </configuration> > May 14 13:29:13 iblade6 cib[2848]: info: cib:diff: - </cib> > May 14 13:29:13 iblade6 cib[2848]: info: cib:diff: + <cib epoch="496" > num_updates="1" admin_epoch="0" validate-with="pacemaker-1.2" > cib-last-written="Mon May 13 18:50:25 2013" crm_feature_set="3.0.6" > update-origin="iblade6.net.rts" update-client="cibadmin" have-quorum="1" > dc-uuid="2130706433" > > May 14 13:29:13 iblade6 cib[2848]: info: cib:diff: + <configuration > > May 14 13:29:13 iblade6 cib[2848]: info: cib:diff: + <resources > > May 14 13:29:13 iblade6 cib[2848]: info: cib:diff: + <group > id="FAST_SENDERS" > > May 14 13:29:13 iblade6 cib[2848]: info: cib:diff: + <meta_attributes > id="FAST_SENDERS-meta_attributes" __crm_diff_marker__="added:top" > > May 14 13:29:13 iblade6 cib[2848]: info: cib:diff: + <nvpair > id="FAST_SENDERS-meta_attributes-target-role" name="target-role" > value="Started" /> > May 14 13:29:13 iblade6 cib[2848]: info: cib:diff: + </meta_attributes> > May 14 13:29:13 iblade6 cib[2848]: info: cib:diff: + </group> > May 14 13:29:13 iblade6 cib[2848]: info: cib:diff: + </resources> > May 14 13:29:13 iblade6 cib[2848]: info: cib:diff: + </configuration> > May 14 13:29:13 iblade6 cib[2848]: info: cib:diff: + </cib> > May 14 13:29:13 iblade6 cib[2848]: info: cib_process_request: Operation > complete: op cib_replace for section resources (origin=local/cibadmin/2, > version=0.496.1): ok (rc=0) > May 14 13:29:13 iblade6 pengine[2852]: notice: LogActions: Start > Trades_INCR_A#011(iblade6.net.rts) > May 14 13:29:13 iblade6 pengine[2852]: notice: LogActions: Start > Trades_INCR_B#011(iblade6.net.rts) > May 14 13:29:13 iblade6 pengine[2852]: notice: LogActions: Start > Security_A#011(iblade6.net.rts) > May 14 13:29:13 iblade6 pengine[2852]: notice: LogActions: Start > Security_B#011(iblade6.net.rts) > May 14 13:29:13 iblade6 crmd[2853]: notice: do_state_transition: State > transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS > cause=C_IPC_MESSAGE origin=handle_response ] > May 14 13:29:13 iblade6 crmd[2853]: info: do_te_invoke: Processing graph > 41 (ref=pe_calc-dc-1368523753-125) derived from > /var/lib/pengine/pe-input-452.bz2 > May 14 13:29:13 iblade6 crmd[2853]: info: te_rsc_command: Initiating > action 80: start Trades_INCR_A_start_0 on iblade6.net.rts (local) > May 14 13:29:13 iblade6 cluster: error: validate_cib_digest: Digest > comparision failed: expected 2c91194022c98636f90df9dd5e7176c6 > (/var/lib/heartbeat/crm/cib.Zm249H), calculated > bc160870924630b3907c8cb1c3128eee > May 14 13:29:13 iblade6 cluster: error: retrieveCib: Checksum of > /var/lib/heartbeat/crm/cib.a024wF failed! Configuration contents ignored! > May 14 13:29:13 iblade6 cluster: error: retrieveCib: Usually this is > caused by manual changes, please refer to > http://clusterlabs.org/wiki/FAQ#cib_changes_detected > May 14 13:29:13 iblade6 cluster: error: crm_abort: write_cib_contents: > Triggered fatal assert at io.c:662 : retrieveCib(tmp1, tmp2, FALSE) != NULL > May 14 13:29:13 iblade6 pengine[2852]: notice: process_pe_message: > Transition 41: PEngine Input stored in: /var/lib/pengine/pe-input-452.bz2 > May 14 13:29:13 iblade6 cib[2848]: error: cib_diskwrite_complete: Disk > write failed: status=134, signo=6, exitcode=0 > May 14 13:29:13 iblade6 cib[2848]: error: cib_diskwrite_complete: > Disabling disk writes after write failure > > > It happened two times during last week. Both while the cluster was running. > >>> May 14 13:29:13 iblade6 cluster: error: validate_cib_digest: Digest >>> comparision failed: expected 2c91194022c98636f90df9dd5e7176c6 >>> (/var/lib/heartbeat/crm/cib.Zm249H), calculated bc1 >>> 60870924630b3907c8cb1c3128eee >>> May 14 13:29:13 iblade6 cluster: error: retrieveCib: Checksum of >>> /var/lib/heartbeat/crm/cib.a024wF failed! Configuration contents ignored! >>> May 14 13:29:13 iblade6 cluster: error: retrieveCib: Usually this is >>> caused by manual changes, please refer to >>> http://clusterlabs.org/wiki/FAQ#cib_changes_detected >>> May 14 13:29:13 iblade6 cluster: error: crm_abort: write_cib_contents: >>> Triggered fatal assert at io.c:662 : retrieveCib(tmp1, tmp2, FALSE) != NULL >>> May 14 13:29:13 iblade6 pengine[2852]: notice: process_pe_message: >>> Transition 41: PEngine Input stored in: /var/lib/pengine/pe-input-452.bz2 >>> May 14 13:29:13 iblade6 cib[2848]: error: cib_diskwrite_complete: Disk >>> write failed: status=134, signo=6, exitcode=0 >>> May 14 13:29:13 iblade6 cib[2848]: error: cib_diskwrite_complete: >>> Disabling disk writes after write failure >>> >>> >>> I didn't find anything about it, at this link: >>> http://clusterlabs.org/wiki/FAQ#cib_changes_detected >>> >>> What can be the reason of this error? >>> Why the checksum of a cib file can be wrong? >>> Is it a problem of a hdd, or pacemaker bug or something else? (there are no >>> disk or filesystem errors in syslog) >>> >>> I had a pair of such incidents during the last week. >>> >>> >>> My cluster installation: CentOS 6.4 x86_64, pacemaker 1.1.7, corosync 2.3.0 >>> >>> Thank you in advance! >>> >>> Ivan Khalezov. >>> _______________________________________________ >>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>> >>> Project Home: http://www.clusterlabs.org >>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> Bugs: http://bugs.clusterlabs.org >> >> _______________________________________________ >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org > > Ivan Khalezov > > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org