Arpi wrote: > I doubt the non-spam folder is 100% spam-free. > There are few (<100) hits of few rule which shouldn't be hit by non-spam > at all. Maybe these mails should be manually verified...
I don't think it's fully spam-free either -- in fact there's one submitter of nonspam in particular who still has quite a bit of spammy-looking signatures in his submitted mails. He and I have both done quite a bit of work on cleaning it up though, which is partly why 2.1 took a little longer to release than expected. By and large, I think now we're at a point where the quality of the corpus is very high. It might not be fuly comprehensive of a representative sample of emails from the right fields, bu I think there are few spams in the nonspam list, and those still in there are not substantially altering the scores being produced. A bigger problem I would say are rules which are not achieving their aims (like the RATWARE rule triggering on YMR), and fixing those will achieve better results than spending hours checking 68,000 emails in mail archives. C _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk