IMO, all AWL needs is an auto expiry systems like bayes has.

For us as a College, AWL makes a HUGE difference when students submit their
thesis, term papers, etc. which at times may be on sexual debauchery, KP,
internet scams etc.  With AWL, it sees that all previous messages from this
individaul over the last x years have been good and does not block this
important email.   We enabled this feature as a direct result of faculty
complaints that some students most important / critical work sometimes
appeared as spam and was missed as a result.


-----Original Message-----
From: Alex Woick [mailto:[EMAIL PROTECTED] 
Sent: Saturday, January 20, 2007 12:24 PM
To: Matt Kettler
Cc: Andy Figueroa; users@spamassassin.apache.org
Subject: Re: use or not use awl

Matt Kettler wrote:
> That said, I think the AWL is a great idea, but not ready for 
> production use on servers with reasonable mail volume. I say that 
> because it completely lacks any kind of useful (ie: atime based) expiry
mechanism.
> The only way to prune the AWL database is by hitcount, using the 
> check_whitelist script from the tools directory of the source tarball
>   
Not neccessarily. Put your awl on a sql database and add a timestamp column
to the awl table, which gets automagically a new timestamp by the dbms each
time a record is updated. The "timestamp" column type in Mysql is such a
type.

show create table awl:

CREATE TABLE `awl` (
  `username` varchar(100) collate latin1_german1_ci NOT NULL default '',
  `email` varchar(200) collate latin1_german1_ci NOT NULL default '',
  `ip` varchar(10) collate latin1_german1_ci NOT NULL default '',
  `count` int(11) default '0',
  `totscore` float default '0',
  `timestamp` timestamp NOT NULL default CURRENT_TIMESTAMP on update
CURRENT_TIMESTAMP,
  PRIMARY KEY  (`username`,`email`,`ip`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 COLLATE=latin1_german1_ci

Then you can easily expire by date with a cron job, for example expire all
that was not updated for the last 30 days:

delete from awl where timestamp < now() - interval 30 day

If you are running that sql statement often and have a large awl table, you
may want to add an index to the timestamp column. You can also make your
custom sql statement with a combination of timestamp and totscore as purge
criteria.

Alex


Reply via email to