Theo Van Dinter wrote:
On Mon, Sep 18, 2006 at 05:48:35PM +0200, Stéphane LEPREVOST wrote:
I'm surprised that nobody seems to complain for problems with auto-whitelist
file growing. Of course, it may be only a question of free time to do this
(I understand SO much these problems...) but does someone knows if someone
plan to code it in a few time ?
Last I heard, AWL was going to be replaced by the History plugin, which
would do expiry and such, but that plugin has since never gone anywhere.
I was also thinking about changing our DBM access to generically support
timestamps for all entries, then have some general functions included to
support expiry. That way we'd have expiry for all DBM-related systems,
instead of currently where Bayes has it's own stuff specifically coded
in, and things like AWL have none.
As usual, patches welcome. :)
In the interim, there's the check_whitelist and trim_whitelist scripts,
and I've amalgamated their functionality into the attached
maintain_whitelist script which can be cronned up easily. I guess you
could consider this a "patch" ;)
The usual caveats apply; I've only given this minimal testing but it
seems to work fine. General usage is:
maintain_whitelist -d /path/to/auto-whitelist -c -t
See maintain_whitelist --help for detail.
Cheers,
Alex
#!/usr/bin/perl -w
use strict;
use Fcntl;
use Getopt::Long;
use File::Basename;
# maybe use this for locking? should really use the one from
Mail::SpamAssassin::Conf
# use Mail::SpamAssassin::Locker::Flock;
BEGIN { @AnyDBM_File::ISA = qw(DB_File GDBM_File NDBM_File SDBM_File); }
use AnyDBM_File;
my ($awldb, $stats, $clean, $trim, $min, $quiet, $help) =
("$ENV{HOME}/.spamassassin/auto_whitelist", 0, 0, 0, 2, 0, 0);
GetOptions(
'dbpath|d=s' => \$awldb,
'stats|s!' => \$stats,
'clean|c!' => \$clean,
'trim|t!' => \$trim,
'min|m=i' => \$min,
'quiet|q!' => \$quiet,
'help|h' => \$help,
) or usage(1);
$help and usage(0);
sub usage {
# at least attempt to be portable...
my $mi = File::Basename::basename($0);
print <<"EOH";
Usage: $mi [options]
-d,--dbpath=<path> Path to AWL db [~/.spamassassin/auto_whitelist]
-s,--stats Generate AWL statistics like check_whitelist
-c,--clean Purge AWL db of rarely-seen entries
-m,--min=<seen> Keep entries seen <seen> times or more [2]
-t,--trim Copy AWL to new db to reduce size
-q,--quiet Don't print summary counts post-cleaning/trimming
-h,--help Print this highly useful and informative text
EOH
exit(shift);
}
die "Auto-Whitelist file doesn't appear to exist!\n" unless -f $awldb;
die "Clean operation requested but unable to open database read-write!\n"
unless -w $awldb;
my %awldb;
my $awldb_obj;
my %newdb;
my $newdb_obj;
if ($trim or $clean) {
$awldb_obj = tie %awldb, "AnyDBM_File", "$awldb", O_RDWR, 0600
or die "Cannot open db file $awldb: $!\n";
if ($trim) {
$newdb_obj = tie %newdb, "AnyDBM_File", "$awldb-new", O_RDWR|O_CREAT,
0600
or die "Cannot open db file $awldb-new: $!\n";
}
} else {
warn "No mode of operation given, assuming --stats\n" unless $stats;
tie %awldb, "AnyDBM_File", $awldb, O_RDONLY, 0600
or die "Cannot open file $awldb: $!\n";
}
my ($kept, $cleaned, $all);
for my $k (grep !/totscore$/, keys %awldb) {
my $count = $awldb{$k};
my $score = $awldb{"$k|totscore"};
unless (defined $score and defined $count) {
if ($clean) {
delete $awldb{$k};
delete $awldb{"$k|totscore"};
}
next;
}
++$all;
if ($stats) {
printf "% 8.1f %15s -- %s\n",
$score/$count, (sprintf "(%.1f/%d)",$score,$count), $k;
}
if ($trim) {
++$cleaned, next if $clean and $count < $min;
++$kept;
$newdb{$k} = $count;
$newdb{"$k|totscore"} = $score;
# an attempt to cut down on the memory usage
# it doesn't appear to work too well, unfortunately
$newdb_obj->sync() unless $all % 100;
} elsif ($clean) {
++$kept, next unless $count < $min;
++$cleaned;
delete $awldb{$k};
delete $awldb{"$k|totscore"};
$awldb_obj->sync() unless $cleaned % 100;
}
}
undef $awldb_obj;
undef $newdb_obj;
untie %awldb;
untie %newdb if %newdb;
my $dir = File::Basename::dirname($awldb);
my ($uid, $gid) = (stat $awldb)[4,5];
if ($trim) {
my $rv = system(qq~mv -f "$awldb-new" "$awldb"~);
die "Move of $awldb-new to $awldb failed! Check $dir for
inconsistencies.\n" if $rv;
}
chown($uid, $gid, $awldb) unless $<;
print "Total: $all - Cleaned: $cleaned - Kept: $kept\n" unless $quiet;