where do I get the most up-to-date copy of PB&W?

Mike S


> -----Original Message-----
> From: Jennifer Wheeler [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, October 15, 2003 11:45 AM
> To: 'Larry Gilson'; [EMAIL PROTECTED]
> Subject: RE: [SAtalk] Popcorn, Backhair, and Weeds
> 
> 
> Ok, now I am in the light.  I think we are looking at this test from
> different perspectives.
> 
> This is what I'm replying to here...
> 
> >"My original goal was to shorten the tests into fewer tests 
> but I think
> I >found a way to shorten the tests into one test - bonus. :)"
> 
> I am not in favor of reducing the rule set...  it was actually
> intentional to have so many rules.  :)  I will explain why.  (you are
> very welcome to change them, however, to best suits your 
> needs.  I'm not
> aruguing.)
> 
> I can't see a way to really shorten the test to one rule, without an
> increased danger of hitting on real tags.  If it were just one rule,
> then you would need to give it a very large score (because it 
> would hit
> just once in an email, although the entire source may be filled with
> those tags)  Reducing the set to one rule takes away the power of the
> set.
> 
> When I was thinking of an idea to bring down the new wave of 
> spam filled
> with these tags ("awwwww lookie.  Someone wrote a new little spamming
> program"), I realized that since there are no longer any 
> spammy words to
> look for, and since there were not enough of the header rules violated
> to score as sapm, these rules would have to match on patterns in the
> source as if they were spammy words themselves.  So I 
> intentionally made
> the rules in a large set...idea being, look for many occurrences of
> hidden garbage tags bracketed by the right pattern of
> letters/spaces/<>...   to prevent fp-s and it it needs to occur many
> times in order to give the thing a large score.
> 
> Now spammers only use the tag one time in an email, 
> rem<!-- missed me missed me -->ove
> ...big deal.  There are enough other hits from words, phrases, methods
> etc, to score it high, plus one more point from popcorn_33.  If they
> litter the entire source with those tags, then it basically renders
> useless most (if not all) of the "looking-for-spammy-talk" rules. In
> this case, the popcorn, backhair or weeds set steps in and takes the
> place of all the default or user defined rules that generally 
> work in an
> email written by the normal person.
> 
> With a mix of normally typed body/selectively inserted tags, 
> the default
> rules and the "sets" work together.  
> 
> I would think that one rule trying to accomplish the same could be
> dangerous and would need a huge score to equal the scores 
> popcorn (etc)
> gives a spam, (making it even more dangerous.)  The Kung Fu comes from
> the set, not just finding one of those tags.  The name, on a 
> side note,
> comes from those tags popping up randomly in the source and 
> obliterating
> identifiable spam lingo.
> 
> Just my opinion.  :)  
> 
> These rules are working so well, it would take a swat team to 
> get me to
> remove them from my config file.  (And even then I might go down with
> the ship!)  I don't know if I would change them, other than 
> what you and
> keith have pointed out could be pared from the expression without
> changing the meaning.
> 
> I would suggest using the rules as they are. (unless you are having a
> problem with them in some way) Watch the source to see what 
> adjustments
> spammers make, because continuing 'as is' will buy their spam 
> a massive
> score.  We will need to add new but similar rules based on their next
> move, which is why I compulsively read the source of every spam I can
> get my hands on.  
> 
> I hope that clarifies my intent with those rules. :)
> Jennifer
> 
> <snip>
> > The rules you're working on look good to me.  I have a couple 
> > questions though, I'm a little confused.  What score will you 
> > be giving the rules? And are you just trying to reduce the 
> > set to one rule?  Or are these suggestions for additional 
> > rules to supplement the others?  I just would like a frame of 
> > reference when I think about them.
> 
> I am starting by using 2 points per test.  My original goal was to
> shorten
> the tests into fewer tests but I think I found a way to shorten the
> tests
> into one test - bonus. :)  I have changed the test since my 
> message.  I
> had
> 
>   / \w{1,7}<\/?[\w\W]{0,150}>\w{1,7}/
> 
> This created some false positives in that it would literally catch
> anything
> between the first word and the last.  This would mean it 
> would skip over
> other legitimate tags until the test matched '>word'.  This was not
> good.
> So I changed it to:
> 
>   / \w{1,7}<\/?[^<>]{0,150}>\w{1,7}/
> 
> This one seems to be working well so far.  It will catch any 
> normal and
> funky stuff within the tags but makes sure it will not run over any
> subsequent tags.
> 
> The second rule:
> 
>   /<!?-?-? ?\w{7,} ?-?-?>/
> 
> Is just pattern matching and really reinforces the above test in a
> subset of
> spam messages the the above will match.
> 
> <snip>
> 
> 
> 
> 
> 
> -------------------------------------------------------
> This SF.net email is sponsored by: SF.net Giveback Program.
> SourceForge.net hosts over 70,000 Open Source Projects.
> See the people who have HELPED US provide better services:
> Click here: http://sourceforge.net/supporters.php
> _______________________________________________
> Spamassassin-talk mailing list
> [EMAIL PROTECTED]
> https://lists.sourceforge.net/lists/listinfo/spamassassin-talk
> 


-------------------------------------------------------
This SF.net email is sponsored by: SF.net Giveback Program.
SourceForge.net hosts over 70,000 Open Source Projects.
See the people who have HELPED US provide better services:
Click here: http://sourceforge.net/supporters.php
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to