On Wed, 8 Mar 2006, Daryl C. W. O'Shea wrote:

On 08/03/06 09:17 PM, Dan Mahoney, System Admin wrote:
On Tue, 7 Mar 2006, Theo Van Dinter wrote:

On Tue, Mar 07, 2006 at 08:14:38PM -0500, Dan Mahoney, System Admin wrote:

[Odd SQL Error]

Just because the configs are in a SQL database, doesn't mean that they'll work even though they are invalid. ;)

Underscore or not, they're not valid.

That's fine -- but the format they're IN in the DB *IS* valid. Somewhere between being in the DB and getting read into spamd something's putting an underscore where there should be whitespace (hence my logic at pasting those SQL queries -- note that it also ONLY happens for certain prefs, too -- another user could have identical prefs and be fine)

It's a separate issue from the lockups -- and I'm not sure why it's happening -- but it's definitely an issue. What action is best to take on this? Bug report? It goes without saying here, but if anyone needs a shell (I think JM has one on the box where spamc runs), let me know and I'll get you one without question.

[pyzor, razor, dcc]

I could see trouble with any ONE of these things -- but all three?
Something feels odd.

Do you have a firewall blocking those services from working?

No firewall at all, and those processes hit (as in, generate scores) most of the time. I'm under the impression if they were going to be killed by a firewall config, it would be a very all or nothing type thing.

It's a realtek NIC, but that shouldn't matter at all since ALL this box does is mysql and spam processes (less than a meg). Duplex is 100/full (auto) on a known good (but unmanaged) switch.

Well, if the timeouts (and they are just timeouts) are congestion related, it would make sense that all three would timeout.

True, of course. I don't think it's congestion either -- although at some point we may start running our own (public) DCC server in-house. Need to speak to Vernon about that.

And there was the mention I saw earlier of still-escaping alarms. However, I haven't had a true lockup since making all these changes. Usually those happen late at night when I say "they musta fixed it, things haven't locked up in a while..." :)

Mine only hangs up when I think to myself, "gee it's been awhile since it last hung up" and then proceed to travel at least three hours away form the server, often far from connectivity or into exceptionally slow or unreliable dial-up land.

It's at the point where I've thought about having a process tail -F the log for the occurence of BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB, and take action on it. There's only been once or twice where it's hung up AGAIN after restarting -- usually on a ton of messages for one user. This is usually the point at which I send a message to the mailing list about "can we please get a way to dump messages-in-process to a single file, ala the apache server-status screen" so I can see WHY it's hanging up, and on which rule.

On the thought that it's congestion, maybe I should start running smokeping against the various servers utilized by dcc/pyzor/razor -- would anyone else be interested in this data?

I'll let you guys know as soon as I do (it does), however. __alarm__ messages without a corresponding freeze CONCERN me, but not NEARLY as much as when the system is "of a down".

"__alarm__" without another warning explaining it immediately afterwards are bad.

"__alarm__ignore__" are just timeouts that can be, well, ignored.

I await your reply :)

-Dan

--

"The first annual 5th of July party...have you been invited?"
"It's a Jack Party."
"Okay, so Long Island's been invited."

--Cali and Gushi, 6/23/02


--------Dan Mahoney--------
Techie,  Sysadmin,  WebGeek
Gushi on efnet/undernet IRC
ICQ: 13735144   AIM: LarpGM
Site:  http://www.gushi.org
---------------------------

Reply via email to