I know there is a better way to go about this but for some reason the
solution escapes me.

 

Basically what I am trying to do here is parse through an email file
grab the the basics, from/to/subject put those in a small text tab
separated database in the format of

File NumRecipients From FromIP Subject Spam-Status and then pass the
contents along to spamassassin pm to check the status but the email file
contains these lines which mess with spamassassins filtering which I
have to remove in order to get an accurate spam score(using the pm not
the daemon don't ask me why :-))

---------------------

P I 19-10-2005 21:35:00 0000 ____ ____ < [EMAIL PROTECTED] >

O T

A domain.com [123.12.123.1]

S SMTP [IP ADDRESS]

R W 19-10-2005 21:35:00 0000 ____ _FY_ [EMAIL PROTECTED]

R W 19-10-2005 21:35:00 0000 ____ _FY_ [EMAIL PROTECTED]

R W 19-10-2005 21:35:00 0000 ____ _FY_ [EMAIL PROTECTED]

R W 19-10-2005 21:35:00 0000 ____ _FY_ [EMAIL PROTECTED]

..........

---------------------

 

 

Right now it works the way it was intended but is super slow,I believe
this line is slowing it down significantly

push(@lines, $_) if ($_ !~ /^P . (\d{1,2}\-\d{1,2}\-\d{4})
(\d{1,2}:\d{1,2}:\...........

 

 

 

Here is the snippet from my code

----<begin>--------

sub process_file {

my @lines;

if (-e $File::Find::name) {

open (EMAIL, "$File::Find::name") || print "cannot open
$File::Find::name for reading\n";

while (<EMAIL>) {

chomp;

#determine senders

if ($_ =~/^P . (\d{1,2}\-\d{1,2}\-\d{4}) (\d{1,2}:\d{1,2}:\d{1,2}
\d{4,4}) .{1,4} .{1,4} \<(\w[*?+a-zA-Z_.\-0-9]+\@)(.*?)\>/    ){

add to text database

}

# determine recipients.

if ($_ =~/^R . (\d{1,2}\-\d{1,2}\-\d{4}) (\d{1,2}:\d{1,2}:\d{1,2}
\d{4,4}) .{1,4} .{1,4} \<(\w[*?+a-zA-Z_.\-0-9]+\@)(.*?)\>/ ){

add to text database 

}

# Determine the Submission address

if ($_ =~/^S SMTP\s+\[(\d+\.\d+\.\d+\.\d+)\]/) {

add to text database 

}

# Determine the Subject

# Subject: It's Here - WOW   World of Winners Online Auction

if ($_ =~/Subject: (.*)$/) {

}

# Mark X-Spam Status(from already marked mail)

if ($option{S} == 1) {

if ($_ =~ /X-Spam-Status: YES/) {

add to text database

}

}

#add lines we want into array

push(@lines, $_) if ($_ !~ /^P . (\d{1,2}\-\d{1,2}\-\d{4})
(\d{1,2}:\d{1,2}:\d{1,2} \d{4,4})|^R . (\d{1,2}\-\d{1,2}\-\d{4})
(\d{1,2}:\d{1,2}:\d{1,2} \d{4,4})|^S SMTP\s+\[(\d+\.\d+\.\d+\.\d+)\]|^O
L|^S DSN|^O T|^A r|X-Spam-Status: YES/);

}

}

if (($File::Find::name =~ /\/home\/cgpro\/fe\d\/.*?.msg/) && ($option{S}
== 1) && ($message{$File::Find::name}->{'spam_status'} ne
"X-Spam-Status: YES")){

shift @lines if ($lines[0] eq "");

my $line = join("\n", @lines);

my $mail = $spamtest->parse($line);

my $status = $spamtest->check($mail);

my $score = $status->get_score();

print "$File::Find::name score $score\n";# if ($option{D} == 1);

if ($status->is_spam()) {

$message{$File::Find::name}->{'spam_status'} = "X-Spam-Status: Spam";

}

$status->finish();

$mail->finish();

}

$filecount++;

close (FILE);

        }

}

----<end>--------

 

 

Thanks,

 

Reply via email to