where do I get the most up-to-date copy of PB&W? Mike S
> -----Original Message----- > From: Jennifer Wheeler [mailto:[EMAIL PROTECTED] > Sent: Wednesday, October 15, 2003 11:45 AM > To: 'Larry Gilson'; [EMAIL PROTECTED] > Subject: RE: [SAtalk] Popcorn, Backhair, and Weeds > > > Ok, now I am in the light. I think we are looking at this test from > different perspectives. > > This is what I'm replying to here... > > >"My original goal was to shorten the tests into fewer tests > but I think > I >found a way to shorten the tests into one test - bonus. :)" > > I am not in favor of reducing the rule set... it was actually > intentional to have so many rules. :) I will explain why. (you are > very welcome to change them, however, to best suits your > needs. I'm not > aruguing.) > > I can't see a way to really shorten the test to one rule, without an > increased danger of hitting on real tags. If it were just one rule, > then you would need to give it a very large score (because it > would hit > just once in an email, although the entire source may be filled with > those tags) Reducing the set to one rule takes away the power of the > set. > > When I was thinking of an idea to bring down the new wave of > spam filled > with these tags ("awwwww lookie. Someone wrote a new little spamming > program"), I realized that since there are no longer any > spammy words to > look for, and since there were not enough of the header rules violated > to score as sapm, these rules would have to match on patterns in the > source as if they were spammy words themselves. So I > intentionally made > the rules in a large set...idea being, look for many occurrences of > hidden garbage tags bracketed by the right pattern of > letters/spaces/<>... to prevent fp-s and it it needs to occur many > times in order to give the thing a large score. > > Now spammers only use the tag one time in an email, > rem<!-- missed me missed me -->ove > ...big deal. There are enough other hits from words, phrases, methods > etc, to score it high, plus one more point from popcorn_33. If they > litter the entire source with those tags, then it basically renders > useless most (if not all) of the "looking-for-spammy-talk" rules. In > this case, the popcorn, backhair or weeds set steps in and takes the > place of all the default or user defined rules that generally > work in an > email written by the normal person. > > With a mix of normally typed body/selectively inserted tags, > the default > rules and the "sets" work together. > > I would think that one rule trying to accomplish the same could be > dangerous and would need a huge score to equal the scores > popcorn (etc) > gives a spam, (making it even more dangerous.) The Kung Fu comes from > the set, not just finding one of those tags. The name, on a > side note, > comes from those tags popping up randomly in the source and > obliterating > identifiable spam lingo. > > Just my opinion. :) > > These rules are working so well, it would take a swat team to > get me to > remove them from my config file. (And even then I might go down with > the ship!) I don't know if I would change them, other than > what you and > keith have pointed out could be pared from the expression without > changing the meaning. > > I would suggest using the rules as they are. (unless you are having a > problem with them in some way) Watch the source to see what > adjustments > spammers make, because continuing 'as is' will buy their spam > a massive > score. We will need to add new but similar rules based on their next > move, which is why I compulsively read the source of every spam I can > get my hands on. > > I hope that clarifies my intent with those rules. :) > Jennifer > > <snip> > > The rules you're working on look good to me. I have a couple > > questions though, I'm a little confused. What score will you > > be giving the rules? And are you just trying to reduce the > > set to one rule? Or are these suggestions for additional > > rules to supplement the others? I just would like a frame of > > reference when I think about them. > > I am starting by using 2 points per test. My original goal was to > shorten > the tests into fewer tests but I think I found a way to shorten the > tests > into one test - bonus. :) I have changed the test since my > message. I > had > > / \w{1,7}<\/?[\w\W]{0,150}>\w{1,7}/ > > This created some false positives in that it would literally catch > anything > between the first word and the last. This would mean it > would skip over > other legitimate tags until the test matched '>word'. This was not > good. > So I changed it to: > > / \w{1,7}<\/?[^<>]{0,150}>\w{1,7}/ > > This one seems to be working well so far. It will catch any > normal and > funky stuff within the tags but makes sure it will not run over any > subsequent tags. > > The second rule: > > /<!?-?-? ?\w{7,} ?-?-?>/ > > Is just pattern matching and really reinforces the above test in a > subset of > spam messages the the above will match. > > <snip> > > > > > > ------------------------------------------------------- > This SF.net email is sponsored by: SF.net Giveback Program. > SourceForge.net hosts over 70,000 Open Source Projects. > See the people who have HELPED US provide better services: > Click here: http://sourceforge.net/supporters.php > _______________________________________________ > Spamassassin-talk mailing list > [EMAIL PROTECTED] > https://lists.sourceforge.net/lists/listinfo/spamassassin-talk > ------------------------------------------------------- This SF.net email is sponsored by: SF.net Giveback Program. SourceForge.net hosts over 70,000 Open Source Projects. See the people who have HELPED US provide better services: Click here: http://sourceforge.net/supporters.php _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk