[SAtalk] Memory utilization buildup

Steven Manross Mon, 29 Sep 2003 13:47:54 -0700

I'm parsing through a bunch of mail in a spam mail archive folder that I
am using to test SA on.

Here's the code, which is basically ripped out of the "bat" file with
minor tweaks and subroutined.

sub is_message_spam {
  my $message_text = $_[0];
  my %opt = ( 'create-prefs' => 1);
  my @array = split(/\n/,$message_text);
  my $mail = Mail::SpamAssassin::NoMailAudit->new ('data' => [EMAIL PROTECTED]);
  my $spamtest = new Mail::SpamAssassin ({
    PREFIX            => 'C:\Perl',
    DEF_RULES_DIR     => 'C:\Perl/share/spamassassin',
    LOCAL_RULES_DIR   => 'C:\Perl/etc/mail/spamassassin',
  });
  my $status = $spamtest->check ($mail);
  $status->rewrite_mail();
  $_[0] = $mail->header(). "\n". join ('', @{$mail->body()});

  my $rtn = $status->{'is_spam'};
  $status->finish();
  return $rtn;
}

I've somewhat tracked that every "check" results in .5 - 5MB~ extra
memory utilization.  As well, I've noticed that it seems that "timelog"
is the source of the "leak"...  If you want to call it that..  I'm sure
there's a good reason that it's logging, and using that memory..

After 50 mails, perl's consumed roughly 80MB - 90MB of RAM.

Considering I want to run through 1000 mails in this scenario, it
doesn't seem possible even though I have 768MB physical, and a page
file.

Is there some switch I can set to see if timelog is doing this (possibly
deleting it's cache after every mail check(), or logging it to a file
(so that it can be dumped from memory), or temporarily disabling it)?

Thanks for any help.

(running W2KSP4, perl 5.6, SA 2.55)

Steven 

-----Original Message-----
From: Justin Mason [mailto:[EMAIL PROTECTED] 
Sent: Monday, September 29, 2003 9:58 AM
To: Jack Gostl
Cc: SpamAssassin listserve
Subject: Re: [SAtalk] Bayes 

Jack Gostl writes:
> That new Bayes algorithm is mighty touchy. So far its tagged four real

> messages with a BAYES_99, three of them today alone. In just five days

> it has had twice the false positives that 2.55 had in four months.
> 
> That's a bit sensitive for my blood. I'm not sure I can release this 
> to my user community with that kind of sensitivity. My inclination is 
> to reduce the score on BAYES_99, but I'm open to suggestion.

Have you tried checking to see what tokens it's picking up, and what
they look like in the db?

1. get the message

2. run "spamassassin -D -t < msg > out", and look for the Bayes tokens
listing in the STDERR output

3. examine that, and compare with output from "sa-learn --dump" to see
what the token probabilities look like.

It could be that some bad training data has crept in somehow into the db
-- and that's why Bayes is FPing.

--j.

-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf _______________________________________________
Spamassassin-talk mailing list [EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

[SAtalk] Memory utilization buildup

Reply via email to