On 16/05/2013, at 9:31 PM, Халезов Иван <i.khale...@rts.ru> wrote:

> On 16.05.2013 07:14, Andrew Beekhof wrote:
>> On 15/05/2013, at 9:53 PM, Халезов Иван <i.khale...@rts.ru> wrote:
>> 
>>> Hello everyone!
>>> 
>>> Some problems occured with synchronisation CIB configuration to disk.
>>> I have this errors in pacemaker's logfile:
>> What were the messages before this?
>> Did it happen once or many times?
>> At startup or while the cluster was running?
> 
> I had updated cluster configuration before, so there was some output about it 
> in the logfile (not from the beginning here, because it is rather big):

I'm guessing some whitespace crept into the configuration.
We've had problems with that in the past, 
https://github.com/beekhof/pacemaker/commit/c2550cbd33a3b2ab7efcd6ef516ba124fbae9a81
 is one patch that you dont have for example.

> 
> May 14 13:29:13 iblade6 cib[2848]:     info: cib:diff: - <primitive 
> id="Security_A" >
> May 14 13:29:13 iblade6 cib[2848]:     info: cib:diff: - <meta_attributes 
> id="Security_A-meta_attributes" >
> May 14 13:29:13 iblade6 cib[2848]:     info: cib:diff: - <nvpair 
> id="Security_A-meta_attributes-target-role" name="target-role" 
> value="Stopped" __crm_diff_marker__="r
> emoved:top" />
> May 14 13:29:13 iblade6 cib[2848]:     info: cib:diff: - </meta_attributes>
> May 14 13:29:13 iblade6 cib[2848]:     info: cib:diff: - </primitive>
> May 14 13:29:13 iblade6 cib[2848]:     info: cib:diff: - <primitive 
> id="Security_B" >
> May 14 13:29:13 iblade6 cib[2848]:     info: cib:diff: - <meta_attributes 
> id="SPBEX_Security_B-meta_attributes" >
> May 14 13:29:13 iblade6 cib[2848]:     info: cib:diff: - <nvpair 
> id="Security_B-meta_attributes-target-role" name="target-role" 
> value="Started" __crm_diff_marker__="removed:top" />
> May 14 13:29:13 iblade6 cib[2848]:     info: cib:diff: - </meta_attributes>
> May 14 13:29:13 iblade6 cib[2848]:     info: cib:diff: - </primitive>
> May 14 13:29:13 iblade6 cib[2848]:     info: cib:diff: - </group>
> May 14 13:29:13 iblade6 cib[2848]:     info: cib:diff: - </resources>
> May 14 13:29:13 iblade6 cib[2848]:     info: cib:diff: - </configuration>
> May 14 13:29:13 iblade6 cib[2848]:     info: cib:diff: - </cib>
> May 14 13:29:13 iblade6 cib[2848]:     info: cib:diff: + <cib epoch="496" 
> num_updates="1" admin_epoch="0" validate-with="pacemaker-1.2" 
> cib-last-written="Mon May 13 18:50:25 2013" crm_feature_set="3.0.6" 
> update-origin="iblade6.net.rts" update-client="cibadmin" have-quorum="1" 
> dc-uuid="2130706433" >
> May 14 13:29:13 iblade6 cib[2848]:     info: cib:diff: + <configuration >
> May 14 13:29:13 iblade6 cib[2848]:     info: cib:diff: + <resources >
> May 14 13:29:13 iblade6 cib[2848]:     info: cib:diff: + <group 
> id="FAST_SENDERS" >
> May 14 13:29:13 iblade6 cib[2848]:     info: cib:diff: + <meta_attributes 
> id="FAST_SENDERS-meta_attributes" __crm_diff_marker__="added:top" >
> May 14 13:29:13 iblade6 cib[2848]:     info: cib:diff: + <nvpair 
> id="FAST_SENDERS-meta_attributes-target-role" name="target-role" 
> value="Started" />
> May 14 13:29:13 iblade6 cib[2848]:     info: cib:diff: + </meta_attributes>
> May 14 13:29:13 iblade6 cib[2848]:     info: cib:diff: + </group>
> May 14 13:29:13 iblade6 cib[2848]:     info: cib:diff: + </resources>
> May 14 13:29:13 iblade6 cib[2848]:     info: cib:diff: + </configuration>
> May 14 13:29:13 iblade6 cib[2848]:     info: cib:diff: + </cib>
> May 14 13:29:13 iblade6 cib[2848]:     info: cib_process_request: Operation 
> complete: op cib_replace for section resources (origin=local/cibadmin/2, 
> version=0.496.1): ok (rc=0)
> May 14 13:29:13 iblade6 pengine[2852]:   notice: LogActions: Start 
> Trades_INCR_A#011(iblade6.net.rts)
> May 14 13:29:13 iblade6 pengine[2852]:   notice: LogActions: Start 
> Trades_INCR_B#011(iblade6.net.rts)
> May 14 13:29:13 iblade6 pengine[2852]:   notice: LogActions: Start 
> Security_A#011(iblade6.net.rts)
> May 14 13:29:13 iblade6 pengine[2852]:   notice: LogActions: Start 
> Security_B#011(iblade6.net.rts)
> May 14 13:29:13 iblade6 crmd[2853]:   notice: do_state_transition: State 
> transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS 
> cause=C_IPC_MESSAGE origin=handle_response ]
> May 14 13:29:13 iblade6 crmd[2853]:     info: do_te_invoke: Processing graph 
> 41 (ref=pe_calc-dc-1368523753-125) derived from 
> /var/lib/pengine/pe-input-452.bz2
> May 14 13:29:13 iblade6 crmd[2853]:     info: te_rsc_command: Initiating 
> action 80: start Trades_INCR_A_start_0 on iblade6.net.rts (local)
> May 14 13:29:13 iblade6 cluster:    error: validate_cib_digest: Digest 
> comparision failed: expected 2c91194022c98636f90df9dd5e7176c6 
> (/var/lib/heartbeat/crm/cib.Zm249H), calculated 
> bc160870924630b3907c8cb1c3128eee
> May 14 13:29:13 iblade6 cluster:    error: retrieveCib: Checksum of 
> /var/lib/heartbeat/crm/cib.a024wF failed!  Configuration contents ignored!
> May 14 13:29:13 iblade6 cluster:    error: retrieveCib: Usually this is 
> caused by manual changes, please refer to 
> http://clusterlabs.org/wiki/FAQ#cib_changes_detected
> May 14 13:29:13 iblade6 cluster:    error: crm_abort: write_cib_contents: 
> Triggered fatal assert at io.c:662 : retrieveCib(tmp1, tmp2, FALSE) != NULL
> May 14 13:29:13 iblade6 pengine[2852]:   notice: process_pe_message: 
> Transition 41: PEngine Input stored in: /var/lib/pengine/pe-input-452.bz2
> May 14 13:29:13 iblade6 cib[2848]:    error: cib_diskwrite_complete: Disk 
> write failed: status=134, signo=6, exitcode=0
> May 14 13:29:13 iblade6 cib[2848]:    error: cib_diskwrite_complete: 
> Disabling disk writes after write failure
> 
> 
> It happened two times during last week. Both while the cluster was running.
> 
>>> May 14 13:29:13 iblade6 cluster:    error: validate_cib_digest: Digest 
>>> comparision failed: expected 2c91194022c98636f90df9dd5e7176c6 
>>> (/var/lib/heartbeat/crm/cib.Zm249H), calculated bc1
>>> 60870924630b3907c8cb1c3128eee
>>> May 14 13:29:13 iblade6 cluster:    error: retrieveCib: Checksum of 
>>> /var/lib/heartbeat/crm/cib.a024wF failed!  Configuration contents ignored!
>>> May 14 13:29:13 iblade6 cluster:    error: retrieveCib: Usually this is 
>>> caused by manual changes, please refer to 
>>> http://clusterlabs.org/wiki/FAQ#cib_changes_detected
>>> May 14 13:29:13 iblade6 cluster:    error: crm_abort: write_cib_contents: 
>>> Triggered fatal assert at io.c:662 : retrieveCib(tmp1, tmp2, FALSE) != NULL
>>> May 14 13:29:13 iblade6 pengine[2852]:   notice: process_pe_message: 
>>> Transition 41: PEngine Input stored in: /var/lib/pengine/pe-input-452.bz2
>>> May 14 13:29:13 iblade6 cib[2848]:    error: cib_diskwrite_complete: Disk 
>>> write failed: status=134, signo=6, exitcode=0
>>> May 14 13:29:13 iblade6 cib[2848]:    error: cib_diskwrite_complete: 
>>> Disabling disk writes after write failure
>>> 
>>> 
>>> I didn't find anything about it, at this link: 
>>> http://clusterlabs.org/wiki/FAQ#cib_changes_detected
>>> 
>>> What can be the reason of this error?
>>> Why the checksum of a cib file can be wrong?
>>> Is it a problem of a hdd, or pacemaker bug or something else? (there are no 
>>> disk or filesystem errors in syslog)
>>> 
>>> I had a pair of such incidents during the last week.
>>> 
>>> 
>>> My cluster installation:  CentOS 6.4 x86_64, pacemaker 1.1.7, corosync 2.3.0
>>> 
>>> Thank you in advance!
>>> 
>>> Ivan Khalezov.
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>> 
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>> 
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> 
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
> 
> Ivan Khalezov
> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Reply via email to