Bayes analysis in a system-wide SpamAssassin installation

2008-05-04 Thread Colin Wetherbee

Greetings.

I'm working on creating a system-wide SpamAssassin installation with 
virtual users, using Postfix on the front-end and dbmail on the 
back-end.  I have a few questions regarding this.


With regard to Bayes filtering, I've copied my old, personal Bayes 
databases over to a global Bayes store, in order to use these databases 
as a "seed" for the system-wide filter.  This seems to work (though, I 
haven't done extensive analysis to determine whether it really is a good 
thing to do), but I'm concerned about bayes_journal being deleted when I 
run sa-learn.


Viz.:

^ [EMAIL PROTECTED]:/srv/disk_pony/bayes_store$ ls -l
total 46352
-rw-rw 1 dbmail staff56088 2008-05-04 19:08 bayes_journal
-rw-rw 1 dbmail staff 41824256 2008-05-04 18:53 bayes_seen
-rw-rw 1 dbmail staff  5521408 2008-05-04 18:53 bayes_toks
^ [EMAIL PROTECTED]:/srv/disk_pony/bayes_store$ ls -l
total 46296
-rw-rw-rw- 1 dbmail staff   30 2008-05-04 19:56 bayes.mutex
-rw-rw 1 dbmail staff 41824256 2008-05-04 19:56 bayes_seen
-rw-rw 1 dbmail staff  5521408 2008-05-04 19:56 bayes_toks

Between those `ls -l` runs, I ran the following in a separate terminal. 
 This ~/bayes_learn directory just contains a bunch of hammy mailing 
list messages that I'm using to test the system.


^ [EMAIL PROTECTED]:~/bayes_learn$ sudo -u dbmail sa-learn --ham 
1209916419.26777_1.iron:2,

Learned tokens from 1 message(s) (1 message(s) examined)

So, during sa-learn, the bayes_journal is deleted, and a new bayes.mutex 
shows up.  What is the significance of this?


I'm not sure what the bayes_journal contains, but I know it grows when 
SpamAssassin processes an email.  Its summary deletion during the 
sa-learn process bothers me.


I'd appreciate any comments or observations you may have. :)

This is SpamAssassin version 3.2.4 running on Perl version 5.8.8, from 
their respective Debian packages.


Other possibly relevant stuff follows.

spamd starts with "--max-children 5 --helper-home-dir -l".

/etc/spamassassin/local.cf contains, among other things:

bayes_path /srv/disk_pony/bayes_store/bayes
bayes_file_mode 0666

Thanks.

Colin


Re: Bayes analysis in a system-wide SpamAssassin installation

2008-05-04 Thread Colin Wetherbee

Theo Van Dinter wrote:

On Sun, May 04, 2008 at 02:14:23PM -0400, Colin Wetherbee wrote:
So, during sa-learn, the bayes_journal is deleted, and a new bayes.mutex 
shows up.  What is the significance of this?


The journal data was synced into the DB, and the mutex file is there for the
rw locking mechanism for the db.


Excellent.

Thanks for clearing that up.

Colin