I don't mind at all that you're scrutinizing the rules  :)  i would love
it if someone wants to improve them.

>> Each of the words use \w{#}? So if you have \w{5}? You would be saying
either 0 or 5 occurrences of [a-zA-Z0-9_].

>From what I understand, placing a ? after {n} does not mean match 0 or
more times in this format.  {n}? just increases the gravity of matching
something exactly n times, and stop trying to match.  So that segment
matches exactly 5 letters before the hidden tag.  Someone correct me if
I'm wrong.

>> So is it possible that you would
>>encounter a situation in which you would find:

>>0 word - tag - 0 word

>><html><body bgcolor="#FFFFFF"><center>
>>The match would be on <center>:

not that I've seen.  It's looking for > or space, then some letters ({n}?
exactly n) then tag, so that wont match.  It wont match on <center>
because the \w{5}? is matching {5 letters before a }<!-- meaningless
letters to obscure a word like the v word -->{ and 1-7 letters following}
the tag then space or period etc.

Each rule hits just one occurrence of an obscured word.  The reason I
split them up into so many rules is that I like to raise scores
cautiously.  I was just trying to avoid false positives by hitting many
occurrences with low scores rather than one large score.  Not sure if my
thinking is valid.

>> I encountered a false positive (on a variant of your rules) as I tried to
>>reduce the number of tests down to one.  The result was as follows:
>>  /(\>|\s)\w{0,7}\<\/?\s?[\w\s]{6,75}\/?\s?\>\w{0,7}(\s|\W|\<)/

>>I think I need to change from \w{0,7} to \w{1,7}; ..

if you are only wanting to use one popcorn rule and give it a higher
score, then yes, you could change the range on both sides of the hidden
tag to \w{1,7} leaving the rest of the expression intact.  I didn't test
it but I think that should work.  In that case, you could probably just up
the obfu comment rule in default spamassassin.  I haven't looked at it to
see if it's looking for the same as these.  I just prefer smaller scores
for rules.  Your idea is good though, because there have been a few
occasions when they only use the hidden tag in the remove me link so that
would boost it nicely if it had a hefty score.  Up to this point, in those
cases, there was enough scoring from the rest of the rules in
spamassassin, these just boosted it higher.   In my case, i might just end
up leaving these rules low and boosting the default rule (i trust those
rules more than mine!)

>> One last question.  Are any of the upper limits necessary?  Spammers may
>>just want to keep uping the limit.  Would it be beneficial to modify
>>[\w\s]{6,150} to [\w\s]{6,}; etc.?

Nah, the upper limits are not necessary... and you're probably right.  I
set them because I read that not setting an upper limit eats up more
memory. I don't know by how much, I was just being cautious and they were
working well in this range.  If they start increasing the amount of
garbage, you could up that range, or just do as you say and not set an
upper limit. {n,} or maybe even empty tag.

>> Overall, the rules are a great addition and have been helping a
>>tremendously.  I hope you do not find me overbearing by picking at the
>>rules.  I think they are great and that is why I am spending some time
>>with them.  Thanks again!

Not at all!!  :)  Like I said, I'm new to this and I basically just work
these like a puzzle until they do what I want.  I feel a little awkward
answering questions when there are so many people on this list far more
qualified!!  Someone jump in if I'm on pluto!

I'm glad they're working out for you!  Let me know if you come up with
some killer variation.  I'm sure they'll need to be modified as spammers
vary their techniques.

Thanks for the input,
Jennifer


>>Regards,
>>Larry


-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Larry
Gilson
Sent: Friday, October 10, 2003 1:41 PM
To: 'Spamassassin-Talk (E-mail)'
Subject: RE: [SAtalk] Popcorn, Backhair, and Weeds

Hi again Jennifer!  I have another question.  Both the BACKCHAIR and POPCORN
rules have the following format:

word - tag - word
/(\>|\s)\w{5}?\<\/?\s?[\w\s]{6,150}\/?\s?\>\w{5}?(\s|\W|\<)/

Each of the words use \w{#}? So if you have \w{5}? You would be saying
either 0 or 5 occurrences of [a-zA-Z0-9_].  So is it possible that you would
encounter a situation in which you would find:

0 word - tag - 0 word

If so, each rule could hit for only one occurrence.  I think the following
could produce this affect:

<html><body bgcolor="#FFFFFF"><center>
The match would be on <center>:
   /\>\<\w{6}\>\s/
Or would [\n\r] be stripped?

   or

<P><CENTER><SMALL>
The match would be on <center> also:
   /\>\<\w{6}\>\</


My thinking may be incorrect so please correct me if I am wrong.  I
encountered a false positive (on a variant of your rules) as I tried to
reduce the number of tests down to one.  The result was as follows:

/(\>|\s)\w{0,7}\<\/?\s?[\w\s]{6,75}\/?\s?\>\w{0,7}(\s|\W|\<)/

I think I need to change from \w{0,7} to \w{1,7}; or [\w\s]{6,75} to
[\w\s]{7,75}.

Am I trying to do to much?  Why did you break up the rules into small
pieces?


One last question.  Are any of the upper limits necessary?  Spammers may
just want to keep uping the limit.  Would it be beneficial to modify
[\w\s]{6,150} to [\w\s]{6,}; etc.?


Overall, the rules are a great addition and have been helping a
tremendously.  I hope you do not find me overbearing by picking at the
rules.  I think they are great and that is why I am spending some time with
them.  Thanks again!


Regards,
Larry


-------------------------------------------------------
This SF.net email is sponsored by: SF.net Giveback Program.
SourceForge.net hosts over 70,000 Open Source Projects.
See the people who have HELPED US provide better services:
Click here: http://sourceforge.net/supporters.php
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk



-------------------------------------------------------
This SF.net email is sponsored by: SF.net Giveback Program.
SourceForge.net hosts over 70,000 Open Source Projects.
See the people who have HELPED US provide better services:
Click here: http://sourceforge.net/supporters.php
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to