[EMAIL PROTECTED] wrote:
Hello All,
I've recently fallen into the task of administering a FreeBSD
5.4-RELEASE box that acts as the web server for a small non-profit that
I volunteer for. Unfortunately, the system has been having some
extremely vexing stability issues over the last month or so, which even
my 6+ years of experience as an OpenBSD admin have not helped me track
down.
First things first, let me say explicitly that I'm not trying to say
"FreeBSD sucks, it's not stable" or anything like that. It's a fine OS,
and I'm sure that it's either faulty hardware or a misconfiguration of
some sort causing these problems. :-)
That said, here are some of the symptoms the box has been experiencing:
* Occasional random reboots. I've only personally witnessed one, and
they don't happen often, but any time a *NIX box just reboots for no
apparent reason (there was no indication of a problem in any of the
logs, at least that I could see), something really bad is going on.
* Random extreme slowness when logging in via SSH, with the time to get
a shell ranging from a second or two all the way up to 80 seconds. The
box isn't busy enough that it's just slow due to load (especially since,
once you're in, things fly), and it's not just a reverse DNS issue like
I've seen on OpenBSD (this occurs even when logging in from locations
listed in /etc/hosts that resolve properly out of that file). Until I
upgraded to the current version of OpenSSL/OpenSSH, the box would
occasionally just become unresponsive altogether over SSH, not allowing
logins for 15+ minutes at a time.
* Issues with files that are not found on startup sometimes, but are
other times. Prime example: the Zope CMS system that's been installed
failed to find libmysqlclient.so after a planned soft reboot, but found
it with no trouble on a subsequent boot a few minutes later, with no
config changes in between.
* A warning in /var/log/messages that the root filesystem was full, when
it was at 60% capacity (and something like 2% inode capacity); the
problem has yet to repeat, though no files have been cleared off of that
filesystem.
* Random crashes of the Zope/Plone system that's running the main part
of the web site. While I realize that, in and of itself, this means
nothing about the stability of the underlying OS, in the context of all
of the other things going on (as well as the fact that the Zope list has
been unable to help figure out why it's crashing), it seems like it
might be further evidence of a larger problem.
Thus far, besides simply scanning log files, constantly watching "top"
and "ps", etc., I've not been able to do much with the box. As I said, I
upgraded OpenSSL/OpenSSH to current versions, and I installed pf as the
firewall (there was none before I arrived...don't even get me started on
that). This weekend the guy who was the previous admin will be running a
Memtest for me and disabling hyperthreading (which there's no
performance justification for, and which has caused me stability issues
at least on Linux in the past), since the server is in Oregon and I'm in
the DC area. That's about the extent of what I've been able to do to
date, since this is a production box.
What I'd like to know from you guys is:
* Am I justified in suspecting hyperthreading as a potential cause of
instability?
* Does 5.4-RELEASE have any known bugs that might cause stability issues
like the ones I've described here? More importantly, would an upgrade to
6.2-RELEASE be worthwhile (as is my instinct), in terms of being
generally more stable and/or having better hardware support? Would such
an upgrade be possible/relatively painless to perform without being
physically at a console, as has been the case with OpenBSD over the years?
* Given my dmesg below, do you see any specific problems?
* Do you have any other suggestions for debugging this problem?
Thanks in advance for any help you can provide. :-)
Alex Kirk
I would certainly think hardware is the place to look.
Just so you know, we still run a server on FBSD 4.8, and it runs very
well. We have 4.8, 4.11, 5.2.1, 5.4, 6.1, and 6.2. Oh, and a couple
Linux, NetBSD, and Solaris boxen too.
I prefer not to chase versions on high load production equipment,
certainly not as a problem resolution strategy. For the record, I have
never had an blind upgrade fix an unidentified problem, and if it did I
would be very worried.
I would guess memory, at least that is where I would look first. I would
also wonder what environment the server runs in, heat is a killer, so is
vibration. Loose racks and humming floors can and will cause connections
to slip. I have fixed servers that ran for months and suddenly showed
odd behavior simply by powering down and removing all cards/ram/cables,
then reattaching everything.
Mysterious failures, 3000 miles to the console, I don't envy you ;^)
DAve
--
Three years now I've asked Google why they don't have a
logo change for Memorial Day. Why do they choose to do logos
for other non-international holidays, but nothing for
Veterans?
Maybe they forgot who made that choice possible.
_______________________________________________
[email protected] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[EMAIL PROTECTED]"