I know there is a better way to go about this but for some reason the
solution escapes me.
Basically what I am trying to do here is parse through an email file
grab the the basics, from/to/subject put those in a small text tab
separated database in the format of
File NumRecipients From FromIP Subject Spam-Status and then pass the
contents along to spamassassin pm to check the status but the email file
contains these lines which mess with spamassassins filtering which I
have to remove in order to get an accurate spam score(using the pm not
the daemon don't ask me why :-))
---------------------
P I 19-10-2005 21:35:00 0000 ____ ____ < [EMAIL PROTECTED] >
O T
A domain.com [123.12.123.1]
S SMTP [IP ADDRESS]
R W 19-10-2005 21:35:00 0000 ____ _FY_ [EMAIL PROTECTED]
R W 19-10-2005 21:35:00 0000 ____ _FY_ [EMAIL PROTECTED]
R W 19-10-2005 21:35:00 0000 ____ _FY_ [EMAIL PROTECTED]
R W 19-10-2005 21:35:00 0000 ____ _FY_ [EMAIL PROTECTED]
..........
---------------------
Right now it works the way it was intended but is super slow,I believe
this line is slowing it down significantly
push(@lines, $_) if ($_ !~ /^P . (\d{1,2}\-\d{1,2}\-\d{4})
(\d{1,2}:\d{1,2}:\...........
Here is the snippet from my code
----<begin>--------
sub process_file {
my @lines;
if (-e $File::Find::name) {
open (EMAIL, "$File::Find::name") || print "cannot open
$File::Find::name for reading\n";
while (<EMAIL>) {
chomp;
#determine senders
if ($_ =~/^P . (\d{1,2}\-\d{1,2}\-\d{4}) (\d{1,2}:\d{1,2}:\d{1,2}
\d{4,4}) .{1,4} .{1,4} \<(\w[*?+a-zA-Z_.\-0-9]+\@)(.*?)\>/ ){
add to text database
}
# determine recipients.
if ($_ =~/^R . (\d{1,2}\-\d{1,2}\-\d{4}) (\d{1,2}:\d{1,2}:\d{1,2}
\d{4,4}) .{1,4} .{1,4} \<(\w[*?+a-zA-Z_.\-0-9]+\@)(.*?)\>/ ){
add to text database
}
# Determine the Submission address
if ($_ =~/^S SMTP\s+\[(\d+\.\d+\.\d+\.\d+)\]/) {
add to text database
}
# Determine the Subject
# Subject: It's Here - WOW World of Winners Online Auction
if ($_ =~/Subject: (.*)$/) {
}
# Mark X-Spam Status(from already marked mail)
if ($option{S} == 1) {
if ($_ =~ /X-Spam-Status: YES/) {
add to text database
}
}
#add lines we want into array
push(@lines, $_) if ($_ !~ /^P . (\d{1,2}\-\d{1,2}\-\d{4})
(\d{1,2}:\d{1,2}:\d{1,2} \d{4,4})|^R . (\d{1,2}\-\d{1,2}\-\d{4})
(\d{1,2}:\d{1,2}:\d{1,2} \d{4,4})|^S SMTP\s+\[(\d+\.\d+\.\d+\.\d+)\]|^O
L|^S DSN|^O T|^A r|X-Spam-Status: YES/);
}
}
if (($File::Find::name =~ /\/home\/cgpro\/fe\d\/.*?.msg/) && ($option{S}
== 1) && ($message{$File::Find::name}->{'spam_status'} ne
"X-Spam-Status: YES")){
shift @lines if ($lines[0] eq "");
my $line = join("\n", @lines);
my $mail = $spamtest->parse($line);
my $status = $spamtest->check($mail);
my $score = $status->get_score();
print "$File::Find::name score $score\n";# if ($option{D} == 1);
if ($status->is_spam()) {
$message{$File::Find::name}->{'spam_status'} = "X-Spam-Status: Spam";
}
$status->finish();
$mail->finish();
}
$filecount++;
close (FILE);
}
}
----<end>--------
Thanks,