Theo Van Dinter wrote:
On Mon, Sep 18, 2006 at 05:48:35PM +0200, Stéphane LEPREVOST wrote:
I'm surprised that nobody seems to complain for problems with auto-whitelist
file growing. Of course, it may be only a question of free time to do this
(I understand SO much these problems...) but does someone knows if someone
plan to code it in a few time ?

Last I heard, AWL was going to be replaced by the History plugin, which
would do expiry and such, but that plugin has since never gone anywhere.

I was also thinking about changing our DBM access to generically support
timestamps for all entries, then have some general functions included to
support expiry.  That way we'd have expiry for all DBM-related systems,
instead of currently where Bayes has it's own stuff specifically coded
in, and things like AWL have none.

As usual, patches welcome. :)


In the interim, there's the check_whitelist and trim_whitelist scripts, and I've amalgamated their functionality into the attached maintain_whitelist script which can be cronned up easily. I guess you could consider this a "patch" ;)

The usual caveats apply; I've only given this minimal testing but it seems to work fine. General usage is:

maintain_whitelist -d /path/to/auto-whitelist -c -t

See maintain_whitelist --help for detail.

Cheers,
Alex
#!/usr/bin/perl -w

use strict;
use Fcntl;
use Getopt::Long;
use File::Basename;
# maybe use this for locking? should really use the one from 
Mail::SpamAssassin::Conf
# use Mail::SpamAssassin::Locker::Flock;

BEGIN { @AnyDBM_File::ISA = qw(DB_File GDBM_File NDBM_File SDBM_File); }
use AnyDBM_File;

my ($awldb, $stats, $clean, $trim, $min, $quiet, $help) = 
("$ENV{HOME}/.spamassassin/auto_whitelist", 0, 0, 0, 2, 0, 0);
GetOptions(
    'dbpath|d=s' => \$awldb,
    'stats|s!'   => \$stats,
    'clean|c!'   => \$clean,
    'trim|t!'    => \$trim,
    'min|m=i'    => \$min,
    'quiet|q!'   => \$quiet,
    'help|h'     => \$help,
) or usage(1);
$help and usage(0);

sub usage {
    # at least attempt to be portable...
    my $mi = File::Basename::basename($0);
    print <<"EOH";
Usage: $mi [options]
  -d,--dbpath=<path>    Path to AWL db       [~/.spamassassin/auto_whitelist]
  -s,--stats            Generate AWL statistics like check_whitelist
  -c,--clean            Purge AWL db of rarely-seen entries
  -m,--min=<seen>       Keep entries seen <seen> times or more            [2]
  -t,--trim             Copy AWL to new db to reduce size
  -q,--quiet            Don't print summary counts post-cleaning/trimming
  -h,--help             Print this highly useful and informative text

EOH
   exit(shift);
}

die "Auto-Whitelist file doesn't appear to exist!\n" unless -f $awldb;
die "Clean operation requested but unable to open database read-write!\n"
  unless -w $awldb;

my %awldb;
my $awldb_obj;
my %newdb;
my $newdb_obj;

if ($trim or $clean) {
    $awldb_obj = tie %awldb, "AnyDBM_File", "$awldb", O_RDWR, 0600
      or die "Cannot open db file $awldb: $!\n";
    if ($trim) {
        $newdb_obj = tie %newdb, "AnyDBM_File", "$awldb-new", O_RDWR|O_CREAT, 
0600
          or die "Cannot open db file $awldb-new: $!\n";
    }
} else {
    warn "No mode of operation given, assuming --stats\n" unless $stats;
    tie %awldb, "AnyDBM_File", $awldb, O_RDONLY, 0600
      or die "Cannot open file $awldb: $!\n";
}

my ($kept, $cleaned, $all);
for my $k (grep !/totscore$/, keys %awldb) {
    my $count = $awldb{$k};
    my $score = $awldb{"$k|totscore"};
    unless (defined $score and defined $count) {
        if ($clean) {
            delete $awldb{$k};
            delete $awldb{"$k|totscore"};
        }
        next;
    }
    ++$all;
    if ($stats) {
        printf "% 8.1f %15s  --  %s\n",
          $score/$count, (sprintf "(%.1f/%d)",$score,$count), $k;
    }
        
    if ($trim) {
        ++$cleaned, next if $clean and $count < $min;
        ++$kept;
        $newdb{$k} = $count;
        $newdb{"$k|totscore"} = $score;
        # an attempt to cut down on the memory usage
        # it doesn't appear to work too well, unfortunately
        $newdb_obj->sync() unless $all % 100;
    } elsif ($clean) {
        ++$kept, next unless $count < $min;
        ++$cleaned;
        delete $awldb{$k};
        delete $awldb{"$k|totscore"};
        $awldb_obj->sync() unless $cleaned % 100;
    }
}

undef $awldb_obj;
undef $newdb_obj;
untie %awldb;
untie %newdb if %newdb;

my $dir = File::Basename::dirname($awldb);
my ($uid, $gid) = (stat $awldb)[4,5];

if ($trim) {
    my $rv = system(qq~mv -f "$awldb-new" "$awldb"~);
    die "Move of $awldb-new to $awldb failed! Check $dir for 
inconsistencies.\n" if $rv;
}
chown($uid, $gid, $awldb) unless $<;

print "Total: $all - Cleaned: $cleaned - Kept: $kept\n" unless $quiet;

Reply via email to