I don't mind at all that you're scrutinizing the rules :) i would love it if someone wants to improve them.
>> Each of the words use \w{#}? So if you have \w{5}? You would be saying either 0 or 5 occurrences of [a-zA-Z0-9_]. >From what I understand, placing a ? after {n} does not mean match 0 or more times in this format. {n}? just increases the gravity of matching something exactly n times, and stop trying to match. So that segment matches exactly 5 letters before the hidden tag. Someone correct me if I'm wrong. >> So is it possible that you would >>encounter a situation in which you would find: >>0 word - tag - 0 word >><html><body bgcolor="#FFFFFF"><center> >>The match would be on <center>: not that I've seen. It's looking for > or space, then some letters ({n}? exactly n) then tag, so that wont match. It wont match on <center> because the \w{5}? is matching {5 letters before a }<!-- meaningless letters to obscure a word like the v word -->{ and 1-7 letters following} the tag then space or period etc. Each rule hits just one occurrence of an obscured word. The reason I split them up into so many rules is that I like to raise scores cautiously. I was just trying to avoid false positives by hitting many occurrences with low scores rather than one large score. Not sure if my thinking is valid. >> I encountered a false positive (on a variant of your rules) as I tried to >>reduce the number of tests down to one. The result was as follows: >> /(\>|\s)\w{0,7}\<\/?\s?[\w\s]{6,75}\/?\s?\>\w{0,7}(\s|\W|\<)/ >>I think I need to change from \w{0,7} to \w{1,7}; .. if you are only wanting to use one popcorn rule and give it a higher score, then yes, you could change the range on both sides of the hidden tag to \w{1,7} leaving the rest of the expression intact. I didn't test it but I think that should work. In that case, you could probably just up the obfu comment rule in default spamassassin. I haven't looked at it to see if it's looking for the same as these. I just prefer smaller scores for rules. Your idea is good though, because there have been a few occasions when they only use the hidden tag in the remove me link so that would boost it nicely if it had a hefty score. Up to this point, in those cases, there was enough scoring from the rest of the rules in spamassassin, these just boosted it higher. In my case, i might just end up leaving these rules low and boosting the default rule (i trust those rules more than mine!) >> One last question. Are any of the upper limits necessary? Spammers may >>just want to keep uping the limit. Would it be beneficial to modify >>[\w\s]{6,150} to [\w\s]{6,}; etc.? Nah, the upper limits are not necessary... and you're probably right. I set them because I read that not setting an upper limit eats up more memory. I don't know by how much, I was just being cautious and they were working well in this range. If they start increasing the amount of garbage, you could up that range, or just do as you say and not set an upper limit. {n,} or maybe even empty tag. >> Overall, the rules are a great addition and have been helping a >>tremendously. I hope you do not find me overbearing by picking at the >>rules. I think they are great and that is why I am spending some time >>with them. Thanks again! Not at all!! :) Like I said, I'm new to this and I basically just work these like a puzzle until they do what I want. I feel a little awkward answering questions when there are so many people on this list far more qualified!! Someone jump in if I'm on pluto! I'm glad they're working out for you! Let me know if you come up with some killer variation. I'm sure they'll need to be modified as spammers vary their techniques. Thanks for the input, Jennifer >>Regards, >>Larry -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Larry Gilson Sent: Friday, October 10, 2003 1:41 PM To: 'Spamassassin-Talk (E-mail)' Subject: RE: [SAtalk] Popcorn, Backhair, and Weeds Hi again Jennifer! I have another question. Both the BACKCHAIR and POPCORN rules have the following format: word - tag - word /(\>|\s)\w{5}?\<\/?\s?[\w\s]{6,150}\/?\s?\>\w{5}?(\s|\W|\<)/ Each of the words use \w{#}? So if you have \w{5}? You would be saying either 0 or 5 occurrences of [a-zA-Z0-9_]. So is it possible that you would encounter a situation in which you would find: 0 word - tag - 0 word If so, each rule could hit for only one occurrence. I think the following could produce this affect: <html><body bgcolor="#FFFFFF"><center> The match would be on <center>: /\>\<\w{6}\>\s/ Or would [\n\r] be stripped? or <P><CENTER><SMALL> The match would be on <center> also: /\>\<\w{6}\>\</ My thinking may be incorrect so please correct me if I am wrong. I encountered a false positive (on a variant of your rules) as I tried to reduce the number of tests down to one. The result was as follows: /(\>|\s)\w{0,7}\<\/?\s?[\w\s]{6,75}\/?\s?\>\w{0,7}(\s|\W|\<)/ I think I need to change from \w{0,7} to \w{1,7}; or [\w\s]{6,75} to [\w\s]{7,75}. Am I trying to do to much? Why did you break up the rules into small pieces? One last question. Are any of the upper limits necessary? Spammers may just want to keep uping the limit. Would it be beneficial to modify [\w\s]{6,150} to [\w\s]{6,}; etc.? Overall, the rules are a great addition and have been helping a tremendously. I hope you do not find me overbearing by picking at the rules. I think they are great and that is why I am spending some time with them. Thanks again! Regards, Larry ------------------------------------------------------- This SF.net email is sponsored by: SF.net Giveback Program. SourceForge.net hosts over 70,000 Open Source Projects. See the people who have HELPED US provide better services: Click here: http://sourceforge.net/supporters.php _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk ------------------------------------------------------- This SF.net email is sponsored by: SF.net Giveback Program. SourceForge.net hosts over 70,000 Open Source Projects. See the people who have HELPED US provide better services: Click here: http://sourceforge.net/supporters.php _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk