Hi Jennifer, > -----Original Message----- > From: jennifer > Sent: Sunday, October 19, 2003 7:03 PM > To: 'Larry Gilson'; [EMAIL PROTECTED] > Subject: RE: [SAtalk] [RD] Popcorn, Backhair, and Weeds > > > Hi Larry, > > (I added RD since this has turned into rule discussion. Hope that is > ok)
Not a problem. I am still not used to have to hit 'reply to all'. Most lists I belong to inject a 'Reply-To:' header with the list's Email address. > Well we might have inadvertantly merged popcorn and backhair. > You gave two suggestions, and because I've been so busy, I > didn't immediately get around to testing. Yesterday I > started testing your consolidation, but I used the wrong one. > You corrected your second rule when you were showing me an > easier way to type the expression I had written (when you > moved the "!" outside the set.) > > /[>\s]\w{1}<![\w\s\$&!-]{0,150}>\w{1}\W/ > > but when I was going through emails to find the rule to test, > I accidentally tested your first edit, which made the "!" optional. > > /[>\s]\w{1}<[\w\s\$&!-]{0,150}>\w{1}\W/ > > (using a score of zero, and also running the other two sets, > popcorn & backhair). In every email I've checked today, that > rule is matching rule for rule with popcorn, and the same > thing with backhair. I haven't seen any false positives, so > it could very well be that this will work. ?? I think the > thing that is keeping it safe is that this is an html tag > bracketed by characters, rather than tags bracketing > words...which should be the case with html. The ends of the > expression basically are looking for safeguard stoppers... > can anyone think of a case that this would not be so?? Off > hand I couldn't. even if they didn't use a closing tag, as > in " <li>hey " , that wouldn't match because there are not > characters on either side of the "<>". I cant believe though > that there isn't a case that this would hit wrong. If it > did, maybe it would only hit once, and not be dangerous with > a score of just one. (I set up the same set as popcorn, 11-57) I found that using a starting match of zero {0,150} produces false positives with the <BR> tag. <P class=update_msg_body>Thanks<BR>FName LName</P> This is not the best example but I think you can get the point. A lot of legit messages have a break like above. I do not recall any FPs on other tags. It has been difficult for me to test lately. The rules I am working with currently are: full MY_FULL_OBFU_HTSCR /[\s>]\w+<![\w\s\-\$&!;]{0,150}>\w+/ full MY_FULL_OBFU_HTML /[\s>]\w+<[\w\s\/\$&;]{6,150}>\w+/ You will notice I escaped the '-' as Kai suggested. I also took the limits off the word boundaries (obviously not in the context of \b). It is working quite nicely since I separated the rules. The separation was not the key but starting the HTML tag match at 6. The downside is that less spam is matched. I received a lot of spam with fake HTML tags in the 1 to 5 word character range. I am about to test by dropping this match start point from 6 to 3, thus still avoiding 2. > the best thing would be to run some known ham and spam just > to be sure, but if nobody wants to do that (as I don't know > how!! Noob!) I'll just keep watching this and let you know > how it goes. > > Have you done any testing? I'm not going to make a change on > the distribution page unless and until I'm more certain of > the results. > > >Not withstanding the '\w{1}' can be changed to '\w'. Correct? > Yeah, it could be, but I like to see tidiness and 'sameness'. > Call me OCD. :) I agree and mentioned the same in a previous post with Keith. Regards, Larry ------------------------------------------------------- This SF.net email sponsored by: Enterprise Linux Forum Conference & Expo The Event For Linux Datacenter Solutions & Strategies in The Enterprise Linux in the Boardroom; in the Front Office; & in the Server Room http://www.enterpriselinuxforum.com _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk