Sunil, I setup netconsole and verified that the machines are fencing. Thank you for your assistance. -Daniel
On May 26, 2010, at 3:21 PM, Sunil Mushran wrote: > On 05/26/2010 01:39 PM, Daniel McDonald wrote: >> >>> ocfs2 does not reset without a log message. Do you have netconsole >>> setup? Messages logged a tick before reset can only be captured by >>> netconsole/kdump etc. >>> >> Unfortunately no. Here are the two lines in /var/log/message prior to the >> un-intended reboot and then syslog restarting: >> >> May 25 22:26:03 ST2540_X4450_1 kernel: ocfs2_dlm: Nodes in domain >> ("7CCC109F8F16433DB7DB79526A29375A"): 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 >> May 26 04:05:27 ST2540_X4450_1 init: Trying to re-exec init >> May 26 11:49:31 ST2540_X4450_1 syslogd 1.4.1: restart. >> >> At approximately 11:46:34 a fibre cable was intentionally pulled out of the >> SAN. Prior to that, all 15 OCFS2 nodes were performing I/O operations >> with OCFS2 volumes on the SAN. 8 or so nodes fenced, but two simply >> reboot. >> >> Any ideas? I'm curious as to if you believe this reboot could be attributed >> to OCFS2 or possibly a separate issue. I was surprised to see some >> machines with fencing messages and then these two without. >> >> fyi, the same test, when performed with a disk heartbeat threshold of 61, >> did not result in any nodes dropping off. >> > > The only way to know for sure is to get the netconsole logs. As in, > if ocfs2 is the cause of the reboot, then netconsole will capture it. > If not, then it is likely something else. If you are going to use this in > a prod environment, you should seriously consider setting up an > old box as a netconsole server to capture the logs. _______________________________________________ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users