A fair statement as to what it is good for,yes. It could be used for
bayesian body stuff - dunno how that's stacked up in your tests
(which I notice do include most headers) - but it's pretty limited
otherwise.

Note that the PR for these guys (CipherMail or whatever $25000 box
it's called Ironmail I think) has been trumpeting it as "the place"
people should go for spam and send spam. 

Now they are welcome to obfuscate or find quality  as they wish and
if they use it only for certain things it's ok - but it seems very
poorly maintained. The filenames also don't have versions/dates (like
justin) - which means when they correct the corpus, no one will know.


sigh, oh well. Just thought it was worth a heads up. Again, I expect
people even less intelligent than me <g> will come here and post
stuff about that corpus....so git prepared to grit yer teeth.

Out of an evil sense of malice <g>, here's an example of one of their
falsely included messages which IMO doesn't belong in the corpus - it
is simply NOT spam per se.

Now, if they provided an index listing true spam versus hard ham,
this would be acceptable:



subject: NCCI will be moving
x-mailer: GoldMine [5.50.10424]
content-type: text/plain
message-id: <[EMAIL PROTECTED]>
organization: National Creditors Connection Inc
mime-version: 1.0
date: Wed, 30 Oct 2002 13:51:56 -0800
from: Michele Connell <[EMAIL PROTECTED]>
to: [EMAIL PROTECTED]


To all clients: 



Effective November 1st, please be advised that NCCI will be
relocating to:



14 Orchard Road, Suite 200, Lake Forest, CA 92630

Our new phone numbers will be (949) 461-7540 and FAX (949) 581-6080.

Our toll-free numbers of (800) 300-0743 and FAX (800) 711-6346 will
remain the 

same.



If you are currently assigning using our email template, nothing will
change, 

you may still continue to use the template you have.  If you are
assigning via 

fax and would like new forms, please contact Cindy at x228.  If you
are 

assigning via fax and have email, we would like to convert you to
email 

template assigning.  There are many benefits to this.  Please call
Cindy at 

ext. 228, for more information. 



We like to thank everyone in advance for your patience and
understanding. 



Sincerely, 

Michele Connell




--- Matthew Davis <[EMAIL PROTECTED]> wrote:
> * Michael Bell ([EMAIL PROTECTED]) wrote:
> > Agreed. I think it's worthless too. Just wanted to bring up the
> > topic, so we could all be prepared for newbies asking the
> question.
> > Now we have a thread to point to
> > 
> > Here's an example of their substandard corpus. Note that while
> > looking for an example, i ran into another false positive out of
> 5 (a
> > "we're relocating our address to...")
> > 
> > Note how much has been stripped
> 
> So if I understand this correctly, the only real point of this
> spamarchive is to preform body spam checks?  Most of the headers
> have been stripped or munged by humans.
> 
> But the headers is really were we can catch most of the spammers,
> so i see it as a good effort, but useless.
> 
> --
> Matthew Davis
> http://dogpound.vnet.net/
> ----------------------------------------------------------------
> A new standard in obfuscation, ambiguity, & equivocation.
> ----------------------------------------------------------------
> Sunday, December 01, 2002 / 05:16PM


__________________________________________________
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com


-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to