> Now having just said that, I've realized that one thing Justin didn't
> give me access to (I don't think) is the corpus before it's been passed
> through mass-check!  Hopefully you're still there Justin, an we can
> figure something out there.

Craig --

still here, ish -- on dialup and webmail only ;)

oops. I'll see what I can rig up.  Most of the corpus is too big to copy 
around, esp. with my current setup, but I can copy over the borderline 
messages and some online archives of spam/nonspam into the corpus area of 
the site. Then it's just a matter of getting hold of nonspam archives
(spam archives are plentiful in the existing corpus) and using those;
I've been leaning towards downloading MailMan mail archives recently BTW, 
and hand-grepping for spam messages and deleting 'em.

BTW non-English non-spam corpuses/ii are very welcome.  We definitely lean 
towards an english-speaking-techie angle at the moment ;)

Re: the /doc/ thing.  That comes and goes, and I *still* haven't figured
out what's happening BTW ;)


Spamassassin-talk mailing list

Reply via email to