Ok, I'll go look at what was in CVS and build the word list from there.  I agree
on the number of words thing.  We can probably get around that by calculating
the %age of words which are on the list, instead of having a hard threshold.  ie
more like the spam phrases stuff where it comes up with a "porn phrase" score...

C

Michael Moncur wrote:

MM> > Yeah, while incorporating it then running make test I've found a
MM> > few issues :)
MM> > For starters, the \b$word\b is not right in all cases.  Want to
MM> > trap -ed, -ing,
MM> > -es, etc suffixes on the verbs among other things.  It's a good
MM> > starting base
MM> > though.
MM>
MM> While you're looking at it, here are a couple more issues - first, isn't the
MM> eval test looking for three porn-like words *in the entire body* while the
MM> current PORN_3 looks for three words separated by  0-15 characters? Wouldn't
MM> this make it more likely to trigger as a false positive? Perhaps a count
MM> higher than 3 would be better?
MM>
MM> Second, the @porn_words Daniel posted doesn't include all of the words that
MM> PORN_3 does. It's missing everything from, er, "whore" to "titties" in the
MM> current CVS.
MM>
MM> I'm sure it's better than the current PORN_3 regardless.


_______________________________________________________________

Have big pipes? SourceForge.net is looking for download mirrors. We supply
the hardware. You get the recognition. Email Us: [EMAIL PROTECTED]
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to