Lars Ellenberg wrote:
On Thu, Apr 01, 2010 at 12:12:47AM -0600, Alan Robertson wrote:
OK....
Since there was no ssh-as-root between the cluster nodes, I didn't
send all the logs along from every node in the cluster - and it
didn't occur to me to look at all of them.
However, the problem has gotten curioser and curioser - because ALL
the nodes in the cluster reported the same problem at the same
time...
That makes it a lot less likely to be a race condition with the disk
writing infrastructure...
I've attached the relevant lines from the various machines -
slightly processed (date stamp format changed and a few other minor
things).
Let me know if you want me to send all the system logs along...
There should be core files.
You should be able to get some interessting information out there,
especially "the_cib" and "digest" at the point of abort().
Also, for my reference - what method are you using to compute the
digest of the file? That is, what command should I execute to get
the same results?
It's an md5sum over the xml tree -- not over the formated ascii buffer,
though, so "md5sum cib.xml" won't do.
I think it is the same as
echo " $(perl -pe 's/^\s*(.*?)\s*\z/$1/g' cib.whatever)" | md5sum
But there is "cibadmin --md5-sum -x cib.xml",
to use the exact same code path.
This is a change from how it used to be (the last time I looked - at
least according to my not-always-reliable memory). Thanks for the update.
2010/03/31_19:02:52 vhost0384 [13294]: ERROR: crm_abort:
write_cib_contents: Triggered fatal assert at io.c:624 :
retrieveCib(tmp1, tmp2, FALSE) != NULL
So it did not verify right after it was written.
Can you reproduce?
I have no idea. I didn't do anything much. Hopefully the test suite
does a lot more strenuous things...
The core files may actually contains some hints,
so have a look there.
None of them verified. All the nodes in the cluster failed the test at
the same time - and now I have no official CIBs on disk - on any cluster
nodes... I sent Andrew all the CIBs, and all the core files, and
basically everything under /var/lib/heartbeat/ from one machine.
They're from the latest official release - so the binaries that match
them are readily available.
Thanks Lars!
--
Alan Robertson <al...@unix.sh>
"Openness is the foundation and preservative of friendship... Let me
claim from you at all times your undisguised opinions." - William
Wilberforce
_______________________________________________
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker