Ok, now I am in the light.  I think we are looking at this test from
different perspectives.

This is what I'm replying to here...

>"My original goal was to shorten the tests into fewer tests but I think
I >found a way to shorten the tests into one test - bonus. :)"

I am not in favor of reducing the rule set...  it was actually
intentional to have so many rules.  :)  I will explain why.  (you are
very welcome to change them, however, to best suits your needs.  I'm not
aruguing.)

I can't see a way to really shorten the test to one rule, without an
increased danger of hitting on real tags.  If it were just one rule,
then you would need to give it a very large score (because it would hit
just once in an email, although the entire source may be filled with
those tags)  Reducing the set to one rule takes away the power of the
set.

When I was thinking of an idea to bring down the new wave of spam filled
with these tags ("awwwww lookie.  Someone wrote a new little spamming
program"), I realized that since there are no longer any spammy words to
look for, and since there were not enough of the header rules violated
to score as sapm, these rules would have to match on patterns in the
source as if they were spammy words themselves.  So I intentionally made
the rules in a large set...idea being, look for many occurrences of
hidden garbage tags bracketed by the right pattern of
letters/spaces/<>...   to prevent fp-s and it it needs to occur many
times in order to give the thing a large score.

Now spammers only use the tag one time in an email, 
rem<!-- missed me missed me -->ove
...big deal.  There are enough other hits from words, phrases, methods
etc, to score it high, plus one more point from popcorn_33.  If they
litter the entire source with those tags, then it basically renders
useless most (if not all) of the "looking-for-spammy-talk" rules. In
this case, the popcorn, backhair or weeds set steps in and takes the
place of all the default or user defined rules that generally work in an
email written by the normal person.

With a mix of normally typed body/selectively inserted tags, the default
rules and the "sets" work together.  

I would think that one rule trying to accomplish the same could be
dangerous and would need a huge score to equal the scores popcorn (etc)
gives a spam, (making it even more dangerous.)  The Kung Fu comes from
the set, not just finding one of those tags.  The name, on a side note,
comes from those tags popping up randomly in the source and obliterating
identifiable spam lingo.

Just my opinion.  :)  

These rules are working so well, it would take a swat team to get me to
remove them from my config file.  (And even then I might go down with
the ship!)  I don't know if I would change them, other than what you and
keith have pointed out could be pared from the expression without
changing the meaning.

I would suggest using the rules as they are. (unless you are having a
problem with them in some way) Watch the source to see what adjustments
spammers make, because continuing 'as is' will buy their spam a massive
score.  We will need to add new but similar rules based on their next
move, which is why I compulsively read the source of every spam I can
get my hands on.  

I hope that clarifies my intent with those rules. :)
Jennifer

<snip>
> The rules you're working on look good to me.  I have a couple 
> questions though, I'm a little confused.  What score will you 
> be giving the rules? And are you just trying to reduce the 
> set to one rule?  Or are these suggestions for additional 
> rules to supplement the others?  I just would like a frame of 
> reference when I think about them.

I am starting by using 2 points per test.  My original goal was to
shorten
the tests into fewer tests but I think I found a way to shorten the
tests
into one test - bonus. :)  I have changed the test since my message.  I
had

  / \w{1,7}<\/?[\w\W]{0,150}>\w{1,7}/

This created some false positives in that it would literally catch
anything
between the first word and the last.  This would mean it would skip over
other legitimate tags until the test matched '>word'.  This was not
good.
So I changed it to:

  / \w{1,7}<\/?[^<>]{0,150}>\w{1,7}/

This one seems to be working well so far.  It will catch any normal and
funky stuff within the tags but makes sure it will not run over any
subsequent tags.

The second rule:

  /<!?-?-? ?\w{7,} ?-?-?>/

Is just pattern matching and really reinforces the above test in a
subset of
spam messages the the above will match.

<snip>





-------------------------------------------------------
This SF.net email is sponsored by: SF.net Giveback Program.
SourceForge.net hosts over 70,000 Open Source Projects.
See the people who have HELPED US provide better services:
Click here: http://sourceforge.net/supporters.php
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to