Re: FreeBSD Crash without Errors, Warnings, or Panics

Alex Zbyslaw Thu, 13 Apr 2006 11:45:49 -0700

Matthew Hagerty wrote:

Can anyone shed some light on this, give me some options to try? Whathappened to kernel panics and such when there were serious errorsgoing on? The only glimmer of information I have is that *one* timethere was an error on the console about there not being any RAIDcontroller available. I did purchase a spare controller and I'm aboutto swap it out and see if it helps, but for some reason I doubt it.If a controller like that was failing, I would certainly hope to seesome serious error messages or panics going on.
I have been running FreeBSD since version 1.01 and have never had abox so unstable in the last 12 or so years, especially one that issupposed to be "server" quality instead of the make-shift ones I puttogether with desktop hardware. And last, I'm getting sick of myLinux admin friends telling me "told you so! should have runLinux...", please give me something to stick in their pie holes!

Several times now I have had Linux servers (and production quality ones,not built by me ones :-)) die in a somewhat similar fashion. In everycase the cause has been either a flaky disk or a flaky disk controller,or some combination.

What seems to happen is that the disk is entirely "lost" by the OS. Atthat point any process which never accesses the disk (i.e. is already inmemory) is able to run but the moment any process tries to access thedisk it locks up. So you can't ssh in to the server, but if you happento be logged in, you shell is probably cached and keeps working. If youtyped ls recently, you can run ls (but see nothing or get a crypticerror message like I/O Error), for example.

Clearly nothing is logged as the disk has gone AWOL. Often the machinesbehaved fine after a reboot and then did the same some time later. Inone case, the supposedly transparent "RAID-1" array was completelybroken, but Linux logged precisely nothing to tell you :-( You canstick that where you like in your Linux friends :-O

This somewhat fits with your symptoms. If the disk vanished, then allthose postgres processes would probably fail unless everything theyneeded happened to be cached in RAM. The Web server and PHP scriptsprobably are cached in RAM if they are called frequently so you mightwell see lots of postgres processes stacked up.

LSI MegaRAID has a CLI of sorts in sysutils/megarc. You might startwith that (and check the RAID BIOS next time the machine reboots).

I'd say that if you have an alternative RAID controller that would be agood place to start. If LSI do any stndalone diagnostics, you could trythose.


--Alex

PS Kernel's usually panic when some internal state is just too wrong tocontinue. A disk or even a controller disappearing isn't going to makethe internal state wrong - it's just a device gone missing - so I wouldnot be surprised if the machine just locked up.



_______________________________________________
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: FreeBSD Crash without Errors, Warnings, or Panics

Reply via email to