I manually ran sa-learn --force-expire, and it hammered the box.  Here is a debug and timing information (for just a 5 MB file!):

[18002] dbg: bayes: tie-ing to DB file R/O /home/ian/.spamassassin/bayes_toks
[18002] dbg: bayes: tie-ing to DB file R/O /home/ian/.spamassassin/bayes_seen
[18002] dbg: bayes: found bayes db version 3
[18002] dbg: bayes: DB journal sync: last sync: 1161899721
[18002] dbg: bayes: opportunistic call found journal sync due
[18002] dbg: bayes: bayes journal sync starting
[18002] dbg: bayes: tie-ing to DB file R/W /home/ian/.spamassassin/bayes_toks
[18002] dbg: bayes: tie-ing to DB file R/W /home/ian/.spamassassin/bayes_seen
[18002] dbg: bayes: found bayes db version 3
[18002] dbg: bayes: synced databases from journal in 0 seconds: 792 unique entries (974 total entries)
[18002] dbg: bayes: bayes journal sync completed
[18002] dbg: bayes: bayes journal sync starting
[18002] dbg: bayes: bayes journal sync completed
[18002] dbg: bayes: expiry starting
[18002] dbg: bayes: expiry check keep size, 0.75 * max: 112500
[18002] dbg: bayes: token count: 161725, final goal reduction size: 49225
[18002] dbg: bayes: first pass? current: 1161986180, Last: 1161862273, atime: 691200, count: 10015, newdelta: 140627, ratio: 4.91512730903645, period: 43200
[18002] dbg: bayes: can't use estimation method for expiry, unexpected result, calculating optimal atime delta (first pass)
[18002] dbg: bayes: expiry max exponent: 9
------ about 20 seconds elapsed
[18002] dbg: bayes: atime token reduction
[18002] dbg: bayes: ======== ===============
[18002] dbg: bayes: 43200 144256
[18002] dbg: bayes: 86400 133029
[18002] dbg: bayes: 172800 111350
[18002] dbg: bayes: 345600 72306
[18002] dbg: bayes: 691200 9457
[18002] dbg: bayes: 1382400 0
[18002] dbg: bayes: 2764800 0
[18002] dbg: bayes: 5529600 0
[18002] dbg: bayes: 11059200 0
[18002] dbg: bayes: 22118400 0
[18002] dbg: bayes: first pass decided on 691200 for atime delta
------ about 40 seconds elapsed [a sort going on here???]
[18002] dbg: bayes: untie-ing
[18002] dbg: bayes: untie-ing db_toks
[18002] dbg: bayes: untie-ing db_seen
[18002] dbg: bayes: files locked, now unlocking lock
expired old bayes database entries in 60 seconds <= YIKES
152268 entries kept, 9457 deleted
token frequency: 1-occurrence tokens: 68.79%
token frequency: less than 8 occurrences: 18.63%
[18002] dbg: bayes: expiry completed
.
real    1m6.157s
user    0m56.044s <= WOW!
sys     0m2.370s



Anders Norrbring <[EMAIL PROTECTED]> wrote:
Sorry about top-posting, but I just catched the topic, and found it a
bit interesting...

I run my SMTP server entirely in a VMware VM, and have *never* seen a
high CPU usage on that particular machine. I run Postfix, Amavis-new
2.4.3, SA 3.1.7 and quite some plug-ins.

Bayes and quarantine are all in a MySQL database stored on another VM,
no big load there either...
At peaks, I have a 2-4% CPU usage and 20-65% memory usage on eash VM,
all reported by Virtual Center 1.4.

So, naturally I'm curious about why there would be a high CPU load from
using SA.... My guess is that it's something else causing it.

--

Anders Norrbring
Norrbring Consulting

Sammy Anderson skrev:
> I'm pretty sure it is that, because when I turn of bayes altogether, the
> spikes go away. I also ran sa-learn --force-expire and it PEGS the VM.
> With bayes debugging enabled, I see lines like this in my syslog:
>
> bayes: expired old bayes database entries in 236 seconds: 152268 entries
> kept, 9457 deleted
>
> We have about 140 users, each with a 5 MB bayes_toks file, so there is a
> need to expire somebody all throughout the day. Each user is virtual,
> they don't really have an account on the box, but the directories
> correspond to each user address. And we do auto-learn, with
> opportunistic expiry.
>
> Good thought about --round-robin, I am willing to use a little more
> memory if it saves on CPU.
>
> */"Ring, John C" /* wrote:
>
> >From: Sammy Anderson [mailto:[EMAIL PROTECTED]
> >
> >We recently migrated our SpamAssassin installation from a physical 3.6
> GHz system
> >running RHEL 4 and SA 3.0.4 to a VMware VM (ESX 2.5.4) with RHEL 4 as
> the guest OS
> >and SA 3.1.7.
>
> I just did the same thing last week, except we're using RHEL 3 and ESX
> 2.5.2, and the physical box it used to be on was far less powerful then
> yours.
>
> >Each user has their own Bayes files (Berkeley DB) and these were
> copied
> from the old to
> >the new server. Now whenever an expiry process runs on a user's
> database, the CPU
> >spikes, sometimes for a minute or longer.
>
> Hmm. We're using ours as a site-wide MTA to be able to reject incoming
> mails at SMTP time, so no user DBs on the box, but we are running with
> Bayes checking on (Berkeley DB), autolearning off, and manual Bayes
> feeding only a few times a day. Because of that, I don't have practice
> with a heavy Bayes load, but how certain are you that it's Bayes hitting
> the CPU; did you run sa-learn (or spamassassin) with network reporting
> turned off to see if that makes a difference?
>
> I ask because pyzor did keep our CPU at a constant 75% until I turned it
> off; now it varies from 25% to 75% over the day, which is a lot more
> acceptable :)
>
> Another thought, albeit perhaps not directly related, is are you running
> spamd with --robin-robin? When I did that, it reduced the CPU load with
> the trade-off of using a little more memory, which seems to be the
> better trade-off, especially for a VM on ESX.
>
> --
> John C. Ring, Jr.
> [EMAIL PROTECTED]
> Network Engineer
> Union Switch & Signal Inc.
>
> "If men were angels, no government would be necessary. If angels were to
> govern men, neither external nor internal controls on government would
> be necessary." -- James Madison
>
>
> ------------------------------------------------------------------------
> Do you Yahoo!?
> Everyone is raving about the all-new Yahoo! Mail.
>
>


We have the perfect Group for you. Check out the handy changes to Yahoo! Groups.

Reply via email to