I'm parsing through a bunch of mail in a spam mail archive folder that I am using to test SA on.
Here's the code, which is basically ripped out of the "bat" file with minor tweaks and subroutined. sub is_message_spam { my $message_text = $_[0]; my %opt = ( 'create-prefs' => 1); my @array = split(/\n/,$message_text); my $mail = Mail::SpamAssassin::NoMailAudit->new ('data' => [EMAIL PROTECTED]); my $spamtest = new Mail::SpamAssassin ({ PREFIX => 'C:\Perl', DEF_RULES_DIR => 'C:\Perl/share/spamassassin', LOCAL_RULES_DIR => 'C:\Perl/etc/mail/spamassassin', }); my $status = $spamtest->check ($mail); $status->rewrite_mail(); $_[0] = $mail->header(). "\n". join ('', @{$mail->body()}); my $rtn = $status->{'is_spam'}; $status->finish(); return $rtn; } I've somewhat tracked that every "check" results in .5 - 5MB~ extra memory utilization. As well, I've noticed that it seems that "timelog" is the source of the "leak"... If you want to call it that.. I'm sure there's a good reason that it's logging, and using that memory.. After 50 mails, perl's consumed roughly 80MB - 90MB of RAM. Considering I want to run through 1000 mails in this scenario, it doesn't seem possible even though I have 768MB physical, and a page file. Is there some switch I can set to see if timelog is doing this (possibly deleting it's cache after every mail check(), or logging it to a file (so that it can be dumped from memory), or temporarily disabling it)? Thanks for any help. (running W2KSP4, perl 5.6, SA 2.55) Steven -----Original Message----- From: Justin Mason [mailto:[EMAIL PROTECTED] Sent: Monday, September 29, 2003 9:58 AM To: Jack Gostl Cc: SpamAssassin listserve Subject: Re: [SAtalk] Bayes Jack Gostl writes: > That new Bayes algorithm is mighty touchy. So far its tagged four real > messages with a BAYES_99, three of them today alone. In just five days > it has had twice the false positives that 2.55 had in four months. > > That's a bit sensitive for my blood. I'm not sure I can release this > to my user community with that kind of sensitivity. My inclination is > to reduce the score on BAYES_99, but I'm open to suggestion. Have you tried checking to see what tokens it's picking up, and what they look like in the db? 1. get the message 2. run "spamassassin -D -t < msg > out", and look for the Bayes tokens listing in the STDERR output 3. examine that, and compare with output from "sa-learn --dump" to see what the token probabilities look like. It could be that some bad training data has crept in somehow into the db -- and that's why Bayes is FPing. --j. ------------------------------------------------------- This sf.net email is sponsored by:ThinkGeek Welcome to geek heaven. http://thinkgeek.com/sf _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk ------------------------------------------------------- This sf.net email is sponsored by:ThinkGeek Welcome to geek heaven. http://thinkgeek.com/sf _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk