[EMAIL PROTECTED] wrote:
Hello All,

I've recently fallen into the task of administering a FreeBSD 5.4-RELEASE box that acts as the web server for a small non-profit that I volunteer for. Unfortunately, the system has been having some extremely vexing stability issues over the last month or so, which even my 6+ years of experience as an OpenBSD admin have not helped me track down.

First things first, let me say explicitly that I'm not trying to say "FreeBSD sucks, it's not stable" or anything like that. It's a fine OS, and I'm sure that it's either faulty hardware or a misconfiguration of some sort causing these problems. :-)

That said, here are some of the symptoms the box has been experiencing:

* Occasional random reboots. I've only personally witnessed one, and they don't happen often, but any time a *NIX box just reboots for no apparent reason (there was no indication of a problem in any of the logs, at least that I could see), something really bad is going on.

* Random extreme slowness when logging in via SSH, with the time to get a shell ranging from a second or two all the way up to 80 seconds. The box isn't busy enough that it's just slow due to load (especially since, once you're in, things fly), and it's not just a reverse DNS issue like I've seen on OpenBSD (this occurs even when logging in from locations listed in /etc/hosts that resolve properly out of that file). Until I upgraded to the current version of OpenSSL/OpenSSH, the box would occasionally just become unresponsive altogether over SSH, not allowing logins for 15+ minutes at a time.

* Issues with files that are not found on startup sometimes, but are other times. Prime example: the Zope CMS system that's been installed failed to find libmysqlclient.so after a planned soft reboot, but found it with no trouble on a subsequent boot a few minutes later, with no config changes in between.

* A warning in /var/log/messages that the root filesystem was full, when it was at 60% capacity (and something like 2% inode capacity); the problem has yet to repeat, though no files have been cleared off of that filesystem.

* Random crashes of the Zope/Plone system that's running the main part of the web site. While I realize that, in and of itself, this means nothing about the stability of the underlying OS, in the context of all of the other things going on (as well as the fact that the Zope list has been unable to help figure out why it's crashing), it seems like it might be further evidence of a larger problem.

Thus far, besides simply scanning log files, constantly watching "top" and "ps", etc., I've not been able to do much with the box. As I said, I upgraded OpenSSL/OpenSSH to current versions, and I installed pf as the firewall (there was none before I arrived...don't even get me started on that). This weekend the guy who was the previous admin will be running a Memtest for me and disabling hyperthreading (which there's no performance justification for, and which has caused me stability issues at least on Linux in the past), since the server is in Oregon and I'm in the DC area. That's about the extent of what I've been able to do to date, since this is a production box.

What I'd like to know from you guys is:

* Am I justified in suspecting hyperthreading as a potential cause of instability?

* Does 5.4-RELEASE have any known bugs that might cause stability issues like the ones I've described here? More importantly, would an upgrade to 6.2-RELEASE be worthwhile (as is my instinct), in terms of being generally more stable and/or having better hardware support? Would such an upgrade be possible/relatively painless to perform without being physically at a console, as has been the case with OpenBSD over the years?

* Given my dmesg below, do you see any specific problems?

* Do you have any other suggestions for debugging this problem?

Thanks in advance for any help you can provide. :-)

Alex Kirk

I would certainly think hardware is the place to look.

Just so you know, we still run a server on FBSD 4.8, and it runs very well. We have 4.8, 4.11, 5.2.1, 5.4, 6.1, and 6.2. Oh, and a couple Linux, NetBSD, and Solaris boxen too.

I prefer not to chase versions on high load production equipment, certainly not as a problem resolution strategy. For the record, I have never had an blind upgrade fix an unidentified problem, and if it did I would be very worried.

I would guess memory, at least that is where I would look first. I would also wonder what environment the server runs in, heat is a killer, so is vibration. Loose racks and humming floors can and will cause connections to slip. I have fixed servers that ran for months and suddenly showed odd behavior simply by powering down and removing all cards/ram/cables, then reattaching everything.

Mysterious failures, 3000 miles to the console, I don't envy you ;^)

DAve


--
Three years now I've asked Google why they don't have a
logo change for Memorial Day. Why do they choose to do logos
for other non-international holidays, but nothing for
Veterans?

Maybe they forgot who made that choice possible.
_______________________________________________
[email protected] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Reply via email to