Matthew Cline wrote:

>First a few rules to match non-spam:
>
>  body     SIGNATURE_DELIM        /^-- $/
>  describe SIGNATURE_DELIM        Standard signature delimiter present  
>
>While there would be no effort in faking this, it might take a while for some of the 
>spammers to catch on.
>
I have a question -- is this supposed to be a negative score? I mean, it 
looks like it should be, but it would be useful if you could give scores 
for these.

>uri      HTTPS_URL              /https:\/\//
>  describe HTTPS_URL              Spammers don't often use HTTPS
>
>Has anyone seen spam that uses an HTTPS URI?
>
Yes.

>header   MAJORDOMO              Subject =~ /Majordomo (?:request )?results/
>  describe MAJORDOMO              From Majordomo
>
>Majordomo results should definetly not be marked as spam, and spammers are probably 
>unlike to stick "Majordomo results" in their subject.
>
Concur, but no more than -1.

>And now a bunch of spam matching rules:
>
Comment: why aren't some of these spamphrase items?

>header   PLEASE_READ            Subject =~ /please read/i
>describe PLEASE_READ            Please read this!  Please oh please oh please!
>
>header   SUSPICIOUS_FROM        From =~ 
>/(?:sales|money|credit|alert|market|affiliat|unsubscribe|offer|important)/i
>describe SUSPICIOUS_FROM        Suspicious phrase in "From" header
>
>body     READ_TO_END            /read this (?:e-?mail )?to the end/i
>describe READ_TO_END            You'd better read all of this spam!
>
>body     DONT_DELETE            /(?:don'?t delete this|do not delete)/i
>describe DONT_DELETE            Don't delete me!  Nooooo!!!!
>
>body     REAL_THING             /the real thing/i
>describe REAL_THING             It's the real thing, baby!
>
>body     WORKED_FOR_ME          /worked for me/i
>describe WORKED_FOR_ME          It worked for the spammer, why not for you?
>
>body     ALL_NATURAL            /100% natural/i
>describe ALL_NATURAL            Spam is 100% natural?!
>
>body     MONEY_BACK             /money back guarantee/i
>describe MONEY_BACK             Money back guarantee.
>
>body     NO_CATCH               /there is no catch/i
>describe NO_CATCH               There is no catch.
>
>body     NO_OBLIGATION          /no obligation/i
>describe NO_OBLIGATION          There is no obligation.
>
I like these.

>body     NO_DISSAPOINTMENT      /You won'?t be diss?apointed/i
>describe NO_DISSAPOINTMENT      You won't be dissapointed.
>
... though I'm disappointed at your spelling of "disappointment". How about

body NO_DISAPPOINTMENT /You won't be diss?app?ointed/i

>body     SERIOUS_ONLY           /Serious (?:Enqueries|Inquiries) Only/i
>
body SERIOUS_ONLY /Serious [ie]nqu[ie]ries only/i

BTW, does SA remove newlines and carriage returns prior to running these 
tests? That is, would this fail on

blah blah blah blah yap yap yap yap yap yap reet reet reet spam spam 
spam.  Serious
inquiries only.

???

>describe SERIOUS_ONLY           Serious Enqueries Only.
>
>body     RISK_FREE              /risk free/i
>describe RISK_FREE              Risk free.  Suuurreeee....
>
># "as seen on:" or "as seen on ..."
>body     AS_SEEN_ON             /as seen 
>on(?::|\s*(?:NATIONAL|TV|ABC|NBC|CBS|CNN|Oprah|USA Today|48 Hours|New York Times))/i
>
What does the second half of the second group buy you? Wouldn't this 
match on

as seen on:

but also

as seen on <any phrase above>

but not

as seen on: ABC TV

It seems to me that this might be better written as

body AS_SEEN_ON /as seen 
on:?\s*.{1,20}(?:national|TV|ABC|NBC|CBS|CNN|Oprah|USA Today|48 
Hours|New York Times)/

Of course, the problem is that you could go a long time testing for 
media outlets; wouldn't it be just easier to test for

body AS_SEEN_ON /as seen on/

???

>describe AS_SEEN_ON             As seen on national TV!
>
>body     NOT_INTENDED           /not intended for residents ?(:of|in)/i
>describe NOT_INTENDED           Not intended for residents of XYZ.
>
># This phrase appears in many pyramid scheme mails in which
># "My Wife Jody" testimonials are absent
>body     COPY_ACCURATELY        /copy.{1,10}name?.{1,10}address.{1,10}ACCURATELY/i
>describe COPY_ACCURATELY        Common pyramid scheme phrase (1)
>
>body     SEE_FOR_YOURSELF       /See (?:for|it) yourself/i
>describe SEE_FOR_YOURSELF       See for yourself
>
># How many non-spammers send HTML mails that use Flash?
>rawbody  EMBEDED_OBJECT         /<(?:object|embed)/i
>describe EMBEDED_OBJECT         Flash or similar plugin in HTML
>
>uri      BIZ_HTTP_ADDR          /https?\:\/\/[^\/]+\.biz\//i
>describe BIZ_HTTP_ADDR          URI with a .biz domain
>
Overall, I like these.

-- 
          http://www.pricegrabber.com | Dog is my co-pilot.

                                   




_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to