> -----Original Message-----
> From: Brian Ipsen [mailto:[EMAIL PROTECTED] 
> Sent: Sunday, August 24, 2003 10:29 AM
> To: [EMAIL PROTECTED]
> Subject: [Qmail-scanner-general]Suggestion: Option to archive 
> all messages tagged by SpamAssassin
> 
> 
> Hi!
> 
>   I miss an option, where it is possible to specify that 
> qmail-scanner should archive all mails that SpamAssassin 
> identifies as spam.  The reason for this is that I'd like to 
> be able to gather statistics on what rules are triggered on 
> each message - and I can only do this either by storing a 
> copy of each message - or enabling debug-log in SpamAssassin, 
> which unfortunately reguires some disk-space. The other way 
> around I'm able to process each message and store the needed 
> data in an SQL database - and afterwards delete the message.
> 
> Regards,
> 
> /Brian
> 

i would start my stripping the test=(.*) line from X-Spam-Status and
splitting the matching tests 
@tests=split(/\,/,$1);

with 2.60-x, you will run into a small problem with TERSE report which
is no longer an on|off option.  The X-Spam-Status _REPORT_ is
automatically TERSE, and will fold at 78 chars, so your tests= will look
like 

X-Spam-Status: Yes, hits=14.6 required=5.0
tests=BAYES_99,CLICK_BELOW_CAPS,
        DATE_MISSING,SUBJ_HAS_SPACES,SUBJ_HAS_UNIQ_ID autolearn=no
        version=2.60-rc2

on emails that have large amounts of matching rules, so $1 will hold
"BAYES_99=5.4,CLICK_BELOW_CAPS=0.5,DATE_MISSING=1.917," and not grab the
fold.  you would need to set a $next_header=0 and watch for \t's for
header continuation.    

it'll take a little work, but it will be much easier that anything else
you are thinking about doing (IMHO).

then, once you have all the rules in @tests, you can 

foreach my $test (@tests) {
$sql="INSERT INTO test_hits (msgid,rule,score) VALUES (?,?,?)";
..
..
$sth->execute($msgid,$test,NULL);

i use the score field and run 
 _TESTSSCORES(,)_  as above, except with scores appended (eg.
AWL=-3.0,...)
instead of 
 _TESTS(,)_        tests hit separated by , (or other separator)

in the X-Spam-Status: header, so then in my foreach loop, i split again
on the = sign,

foreach my $test (@tests) {
$sql="INSERT INTO test_hits (msgid,rule,score) VALUES (?,?,?)";
..
..
my ($rule,$score) = split(/=/,$test);
$sth->execute($msgid,$rule,$score);

my test_hits db contains 5 columns

CREATE TABLE test_hits (
  id int(10) unsigned NOT NULL auto_increment,
  msgid varchar(254) NOT NULL default '',
  rule varchar(64) NOT NULL default '',
  score float(5,2) NOT NULL default '0.00',
  t timestamp(14) NOT NULL,
  PRIMARY KEY  (id),
  KEY msgid (msgid),
  KEY rule (rule)
) TYPE=MyISAM COMMENT='SpamAssasin Rule Matches';

and indicies on msgid and rule, so i can easily show all rules that
match for a specific msgid, or show how many messages match a certain
rule....

you could extend as needed to include env_sender, recips, spam score,
etc....

enjoy, and good luck!

dallas


-------------------------------------------------------
This SF.net email is sponsored by: VM Ware
With VMware you can run multiple operating systems on a single machine.
WITHOUT REBOOTING! Mix Linux / Windows / Novell virtual machines
at the same time. Free trial click here:http://www.vmware.com/wl/offer/358/0
_______________________________________________
Qmail-scanner-general mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/qmail-scanner-general

Reply via email to