Hi Jennifer,

> -----Original Message-----
> From: jennifer
> Sent: Sunday, October 19, 2003 7:03 PM
> To: 'Larry Gilson'; [EMAIL PROTECTED]
> Subject: RE: [SAtalk] [RD] Popcorn, Backhair, and Weeds
> 
> 
> Hi Larry,
> 
> (I added RD since this has turned into rule discussion.  Hope that is
> ok)

Not a problem.  I am still not used to have to hit 'reply to all'.  Most
lists I belong to inject a 'Reply-To:' header with the list's Email address.


> Well we might have inadvertantly merged popcorn and backhair. 
>  You gave two suggestions, and because I've been so busy, I 
> didn't immediately get around to testing.  Yesterday I 
> started testing your consolidation, but I used the wrong one. 
>  You corrected your second rule when you were showing me an 
> easier way to type the expression I had written (when you 
> moved the "!" outside the set.)
> 
> /[>\s]\w{1}<![\w\s\$&!-]{0,150}>\w{1}\W/
> 
> but when I was going through emails to find the rule to test, 
> I accidentally tested your first edit, which made the "!" optional.  
> 
> /[>\s]\w{1}<[\w\s\$&!-]{0,150}>\w{1}\W/
> 
> (using a score of zero, and also running the other two sets, 
> popcorn & backhair).  In every email I've checked today, that 
> rule is matching rule for rule with popcorn, and the same 
> thing with backhair.  I haven't seen any false positives, so 
> it could very well be that this will work. ?? I think the 
> thing that is keeping it safe is that this is an html tag 
> bracketed by characters, rather than tags bracketing 
> words...which should be the case with html.  The ends of the 
> expression basically are looking for safeguard stoppers...  
> can anyone think of a case that this would not be so??  Off 
> hand I couldn't.  even if they didn't use a closing tag, as 
> in " <li>hey " , that wouldn't match because there are not 
> characters on either side of the "<>".  I cant believe though 
> that there isn't a case that this would hit wrong.  If it 
> did, maybe it would only hit once, and not be dangerous with 
> a score of just one.  (I set up the same set as popcorn, 11-57)

I found that using a starting match of zero {0,150} produces false positives
with the <BR> tag.

   <P class=update_msg_body>Thanks<BR>FName LName</P>

This is not the best example but I think you can get the point.  A lot of
legit messages have a break like above.   I do not recall any FPs on other
tags.  It has been difficult for me to test lately.  The rules I am working
with currently are:

   full  MY_FULL_OBFU_HTSCR /[\s>]\w+<![\w\s\-\$&!;]{0,150}>\w+/
   full  MY_FULL_OBFU_HTML  /[\s>]\w+<[\w\s\/\$&;]{6,150}>\w+/

You will notice I escaped the '-' as Kai suggested.  I also took the limits
off the word boundaries (obviously not in the context of \b).  It is working
quite nicely since I separated the rules.  The separation was not the key
but starting the HTML tag match at 6.  The downside is that less spam is
matched.  I received a lot of spam with fake HTML tags in the 1 to 5 word
character range.  I am about to test by dropping this match start point from
6 to 3, thus still avoiding 2.


> the best thing would be to run some known ham and spam just 
> to be sure, but if nobody wants to do that (as I don't know 
> how!!  Noob!) I'll just keep watching this and let you know 
> how it goes.
> 
> Have you done any testing?  I'm not going to make a change on 
> the distribution page unless and until I'm more certain of 
> the results.
> 
> >Not withstanding the '\w{1}' can be changed to '\w'.  Correct?
> Yeah, it could be, but I like to see tidiness and 'sameness'. 
>  Call me OCD. :)

I agree and mentioned the same in a previous post with Keith.


Regards,
Larry



-------------------------------------------------------
This SF.net email sponsored by: Enterprise Linux Forum Conference & Expo
The Event For Linux Datacenter Solutions & Strategies in The Enterprise 
Linux in the Boardroom; in the Front Office; & in the Server Room 
http://www.enterpriselinuxforum.com
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to