Thanks for the responses. Few questions - will running 'check_whitelist' affect our server's performance? Do I risk creating other problems if I leave things as they are until our sys admin returns? :)
On 7/18/07, Matt Kettler <[EMAIL PROTECTED]> wrote:
Tammy George wrote: > Hello. > > Our Linux server is running SpamAssassin version 3.1.5. > > Backups started dying with 'inactivity timeout'. Dug around & found > the following: > > drwx------ 3 vscan vscan 512 Jul 18 16:28 . > -rw------- 1 vscan vscan 1099983372288 Jul 18 16:28 auto-whitelist > -rw------- 1 vscan vscan 1205862400 Jul 18 16:28 bayes_seen > -rw------- 1 vscan vscan 10846208 Jul 18 16:28 bayes_toks > -rw------- 1 vscan vscan 18240 Jul 18 16:28 bayes_journal > drwxr-x--- 12 vscan vscan 1024 Jul 18 12:12 .. > -rw------- 1 vscan vscan 2654208 Jan 26 2005 > bayes_toks.expire42066 > -rw------- 1 vscan vscan 606208 Mar 30 2004 > bayes_toks.expire93303 > drwxr-xr-x 2 vscan vscan 512 Jan 28 2004 old > -rw-r--r-- 1 vscan vscan 1165 Jan 27 2004 user_prefs > > A du -k shows auto-whitelist as being 1747968. > > Surprisingly, we aren't experiencing any problems other than the > backups. Our site handles A LOT of email. > > After I send this email, I'm going to look into check_whitelist and > trim_whitelist (and probably sa-learn re: the bayes files), however, > any suggestions would be most appreciated! Our sys admin is on > vacation and he's our expert. for the auto-whitelist file you need to run this command: check_whitelist --clean /path/to/auto-whitelist That said, IMHO, the AWL isn't really ready for production use on large systems unless you're going to run it on SQL and use your own scripts to do expiry. The bayes_toks and bayes_journal files auto-expire, so you don't need to do anything to them. The bayes_seen file doesn't have any kind of date information, so it can't auto-expire. However, you can remove the file reasonably safely. This file is just a list of all the files that have already been run through sa-learn. The only drawback to deleting it is that it will allow you to re-train a message that you've already learned. So if you maintain a massive directory of files to be "relearned" but don't clean it out, you might have a minor amount of over-learning (no big deal). > > Thanks in advance for any advice. >