A fair statement as to what it is good for,yes. It could be used for bayesian body stuff - dunno how that's stacked up in your tests (which I notice do include most headers) - but it's pretty limited otherwise.
Note that the PR for these guys (CipherMail or whatever $25000 box it's called Ironmail I think) has been trumpeting it as "the place" people should go for spam and send spam. Now they are welcome to obfuscate or find quality as they wish and if they use it only for certain things it's ok - but it seems very poorly maintained. The filenames also don't have versions/dates (like justin) - which means when they correct the corpus, no one will know. sigh, oh well. Just thought it was worth a heads up. Again, I expect people even less intelligent than me <g> will come here and post stuff about that corpus....so git prepared to grit yer teeth. Out of an evil sense of malice <g>, here's an example of one of their falsely included messages which IMO doesn't belong in the corpus - it is simply NOT spam per se. Now, if they provided an index listing true spam versus hard ham, this would be acceptable: subject: NCCI will be moving x-mailer: GoldMine [5.50.10424] content-type: text/plain message-id: <[EMAIL PROTECTED]> organization: National Creditors Connection Inc mime-version: 1.0 date: Wed, 30 Oct 2002 13:51:56 -0800 from: Michele Connell <[EMAIL PROTECTED]> to: [EMAIL PROTECTED] To all clients: Effective November 1st, please be advised that NCCI will be relocating to: 14 Orchard Road, Suite 200, Lake Forest, CA 92630 Our new phone numbers will be (949) 461-7540 and FAX (949) 581-6080. Our toll-free numbers of (800) 300-0743 and FAX (800) 711-6346 will remain the same. If you are currently assigning using our email template, nothing will change, you may still continue to use the template you have. If you are assigning via fax and would like new forms, please contact Cindy at x228. If you are assigning via fax and have email, we would like to convert you to email template assigning. There are many benefits to this. Please call Cindy at ext. 228, for more information. We like to thank everyone in advance for your patience and understanding. Sincerely, Michele Connell --- Matthew Davis <[EMAIL PROTECTED]> wrote: > * Michael Bell ([EMAIL PROTECTED]) wrote: > > Agreed. I think it's worthless too. Just wanted to bring up the > > topic, so we could all be prepared for newbies asking the > question. > > Now we have a thread to point to > > > > Here's an example of their substandard corpus. Note that while > > looking for an example, i ran into another false positive out of > 5 (a > > "we're relocating our address to...") > > > > Note how much has been stripped > > So if I understand this correctly, the only real point of this > spamarchive is to preform body spam checks? Most of the headers > have been stripped or munged by humans. > > But the headers is really were we can catch most of the spammers, > so i see it as a good effort, but useless. > > -- > Matthew Davis > http://dogpound.vnet.net/ > ---------------------------------------------------------------- > A new standard in obfuscation, ambiguity, & equivocation. > ---------------------------------------------------------------- > Sunday, December 01, 2002 / 05:16PM __________________________________________________ Do you Yahoo!? Yahoo! Mail Plus - Powerful. Affordable. Sign up now. http://mailplus.yahoo.com ------------------------------------------------------- This sf.net email is sponsored by:ThinkGeek Welcome to geek heaven. http://thinkgeek.com/sf _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk