John Hardin wrote:
On Mon, 30 Jan 2012, Kris Deugau wrote:

These are good places to look for fixing up problems with bad
performance under heavy load... the problem at hand is random lockups
during *normal* (or low) load levels - ie, 1-5 messages per second,
where "high" load means 30+/s.

Are you willing to set concurrency limits in your glue as was discussed
earlier? I suspect that would avoid the problem in the least-impactful
manner. If you have, do they appear to successfully avoid the issue?

Granted that's avoiding the problem rather than actually fixing a bug,
but this is a production system and keeping it in production is
important...

Actually, it looks like this was done a while ago to make sure the usual peak loads don't peak quite so hard, at the expense of occasionally having more than a handful of messages in the MX queues. The mail delivery concurrency should be limited to 30 deliveries.

As another stopgap measure, I've tweaked the daily SOUGHT rules update cron job to unconditionally restart SA; I'm debating adding another restart or two (7am, maybe 8pm?).

In any case, to put the problem yet another way... the lockup *causes* the high load condition (eventually) but in the meantime spamd is effectively down for 10-15 minutes or so. Which is something I have never previously seen any indication of happening.

The child-state info logged looks like this:

23:23:32 mfs2 spamd/main[26981]: prefork: child states: BI
23:23:34 mfs2 spamd/main[26981]: prefork: child states: BI
23:23:35 mfs2 spamd/main[26981]: prefork: child states: BI
23:39:20 mfs2 spamd/main[26981]: prefork: child states: BB
23:39:20 mfs2 spamd/main[26981]: prefork: child states: BBB
23:39:20 mfs2 spamd/main[26981]: prefork: child states: BBBB
23:39:20 mfs2 spamd/main[26981]: prefork: child states: BBBBB

The only log entries in between that 23:35 and 39:20 timestamp are:

23:23:37 mfs2 spamd/main[4625]: spamd: connection from mx2-r.vianet.ca [<ip>] at port 47334 23:23:37 mfs2 spamd/main[4625]: spamd: processing message <msgid> for <user>:999

-kgd

Reply via email to