I use hand sorted mailboxes:
"spam": 1150 e-mails
"ham": 662 e-mails

"mixed" is the one that I run spamassassin on and has 1812 e-mails (basicly
ham and spam combined).

I do my first run of SA, check how much mail from "mixed" it sorted into
spam.mbox and ham.mbox (sorted by SA, unlike ham and spam w/o the .mbox),
then teach it with:
sa-learn --mbox --spam /path/to/spam
(Learned from 993 message(s) (1150 message(s) examined).)

sa-learn --mbox --ham /path/to/ham
(Learned from 273 message(s) (662 message(s) examined).)

After that I run SA again.
I would imagine that after learning on "ham" and "spam" it should get all
the messages in "mixed" right, but it doesn't!
(SA 2.64 did after the same procedure)

As for bayes stats:
ham.mbox.1st -> all three suggested greps return 0
spam.mbox.1st -> all three suggested greps return 0

ham.mbox.2nd -> bayes 00 returns 662, bayes 50 & 90 return 0, bayes 99
returns 188
spam.mbox.2nd -> all three suggested greps return 0, grep BAYES_99 returns
1920

I think something somewhere went very wrong, what am I not doing right?

Tnx,
Insems


> At 03:28 AM 10/5/2004, Insems Citam wrote:
> >Still no idea what could be wrong, any ideas guys?
>
> No, because I still don't have a clear vision of what you did :)
>
> What criteria did you use when training? did you just train the spam and
> ham mailboxes that SA generated? or did you hand-sort prior to training?
>
> If you just trained SA based on it's previous classifications without any
> hand sorting, why would you expect any sort of change at all?
>
> Are the BAYES_* rules showing up in the Spam-Status headers of the
messages
> in the second run? What's the mix like?
>
> Try gathering some bayes-specific stats using grep: I'd suggest looking at
> bayes 00, 50, and 99 to start with.
>
>        grep BAYES_00 spam.mbox |wc -l
>        grep BAYES_50 spam.mbox |wc -l
>        grep BAYES_99 spam.mbox |wc -l
>
>        grep BAYES_00 ham.mbox |wc -l
>        grep BAYES_50 ham.mbox |wc -l
>        grep BAYES_99 ham.mbox |wc -l
>
> Compare those to the total counts of each mailbox.
>
>


Reply via email to