Arpi wrote:

> I doubt the non-spam folder is 100% spam-free.
> There are few (<100) hits of few rule which shouldn't be hit by non-spam
> at all. Maybe these mails should be manually verified...

I don't think it's fully spam-free either -- in fact there's one submitter of 
nonspam in particular who still has quite a bit of spammy-looking signatures in 
his submitted mails.  He and I have both done quite a bit of work on cleaning it 
up though, which is partly why 2.1 took a little longer to release than 
expected.  By and large, I think now we're at a point where the quality of the 
corpus is very high.  It might not be fuly comprehensive of a representative 
sample of emails from the right fields, bu I think there are few spams in the 
nonspam list, and those still in there are not substantially altering the scores 
being produced.  A bigger problem I would say are rules which are not achieving 
their aims (like the RATWARE rule triggering on YMR), and fixing those will 
achieve better results than spending hours checking 68,000 emails in mail 
archives.

C


_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to