Ok, now I am in the light. I think we are looking at this test from different perspectives.
This is what I'm replying to here... >"My original goal was to shorten the tests into fewer tests but I think I >found a way to shorten the tests into one test - bonus. :)" I am not in favor of reducing the rule set... it was actually intentional to have so many rules. :) I will explain why. (you are very welcome to change them, however, to best suits your needs. I'm not aruguing.) I can't see a way to really shorten the test to one rule, without an increased danger of hitting on real tags. If it were just one rule, then you would need to give it a very large score (because it would hit just once in an email, although the entire source may be filled with those tags) Reducing the set to one rule takes away the power of the set. When I was thinking of an idea to bring down the new wave of spam filled with these tags ("awwwww lookie. Someone wrote a new little spamming program"), I realized that since there are no longer any spammy words to look for, and since there were not enough of the header rules violated to score as sapm, these rules would have to match on patterns in the source as if they were spammy words themselves. So I intentionally made the rules in a large set...idea being, look for many occurrences of hidden garbage tags bracketed by the right pattern of letters/spaces/<>... to prevent fp-s and it it needs to occur many times in order to give the thing a large score. Now spammers only use the tag one time in an email, rem<!-- missed me missed me -->ove ...big deal. There are enough other hits from words, phrases, methods etc, to score it high, plus one more point from popcorn_33. If they litter the entire source with those tags, then it basically renders useless most (if not all) of the "looking-for-spammy-talk" rules. In this case, the popcorn, backhair or weeds set steps in and takes the place of all the default or user defined rules that generally work in an email written by the normal person. With a mix of normally typed body/selectively inserted tags, the default rules and the "sets" work together. I would think that one rule trying to accomplish the same could be dangerous and would need a huge score to equal the scores popcorn (etc) gives a spam, (making it even more dangerous.) The Kung Fu comes from the set, not just finding one of those tags. The name, on a side note, comes from those tags popping up randomly in the source and obliterating identifiable spam lingo. Just my opinion. :) These rules are working so well, it would take a swat team to get me to remove them from my config file. (And even then I might go down with the ship!) I don't know if I would change them, other than what you and keith have pointed out could be pared from the expression without changing the meaning. I would suggest using the rules as they are. (unless you are having a problem with them in some way) Watch the source to see what adjustments spammers make, because continuing 'as is' will buy their spam a massive score. We will need to add new but similar rules based on their next move, which is why I compulsively read the source of every spam I can get my hands on. I hope that clarifies my intent with those rules. :) Jennifer <snip> > The rules you're working on look good to me. I have a couple > questions though, I'm a little confused. What score will you > be giving the rules? And are you just trying to reduce the > set to one rule? Or are these suggestions for additional > rules to supplement the others? I just would like a frame of > reference when I think about them. I am starting by using 2 points per test. My original goal was to shorten the tests into fewer tests but I think I found a way to shorten the tests into one test - bonus. :) I have changed the test since my message. I had / \w{1,7}<\/?[\w\W]{0,150}>\w{1,7}/ This created some false positives in that it would literally catch anything between the first word and the last. This would mean it would skip over other legitimate tags until the test matched '>word'. This was not good. So I changed it to: / \w{1,7}<\/?[^<>]{0,150}>\w{1,7}/ This one seems to be working well so far. It will catch any normal and funky stuff within the tags but makes sure it will not run over any subsequent tags. The second rule: /<!?-?-? ?\w{7,} ?-?-?>/ Is just pattern matching and really reinforces the above test in a subset of spam messages the the above will match. <snip> ------------------------------------------------------- This SF.net email is sponsored by: SF.net Giveback Program. SourceForge.net hosts over 70,000 Open Source Projects. See the people who have HELPED US provide better services: Click here: http://sourceforge.net/supporters.php _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk