> -----Original Message----- > From: jennifer [mailto:[EMAIL PROTECTED] > Sent: Wednesday, November 19, 2003 8:45 PM > To: 'Larry Gilson'; [EMAIL PROTECTED] > Subject: RE: [SAtalk] [RD] second weeds set > > > Hi Larry, > > I agree, it would be nice if there was a way to consolidate, > and maybe there is, I just don't know what the answer is. > Actually, that's not true. I came up with a thought today (I > just remembered) and I think I can scale them down some. > I'll mess around and let you know if it works. I may have to > deliver them untested though :)
Food for thought . . . They can be consolidated. Let's look at backhair. J_BACKHAIR_11 /[>\s]\w{1}<[^>]{6,150}>\w{1}\W/ The set J_BACKHAIR_1x can be changed to one rule. J_BACKHAIR_1x /[>\s]\w{1}<[^>]{6,150}>\w{1,7}\W/ If all the sets were consolidated as above, the following real life tag: * 1.0 -- BODY: 3 letters - Unsightly html tag - 4 letters * 1.0 -- BODY: 2 letters - Unsightly html tag - 2 letters * 1.0 -- BODY: 2 letters - Unsightly html tag - 1 letters * 1.0 -- BODY: 2 letters - Unsightly html tag - 3 letters * 1.0 -- BODY: 2 letters - Unsightly html tag - 4 letters * 1.0 -- BODY: 1 letters - Unsightly html tag - 5 letters * 1.0 -- BODY: 3 letters - Unsightly html tag - 3 letters * 1.0 -- BODY: 3 letters - Unsightly html tag - 2 letters * 1.0 -- BODY: 4 letters - Unsightly html tag - 3 letters * 1.0 -- BODY: 1 letters - Unsightly html tag - 1 letters * 1.0 -- BODY: 2 letters - Unsightly html tag - 6 letters * 1.0 -- BODY: 1 letters - Unsightly html tag - 3 letters * 1.0 -- BODY: 1 letters - Unsightly html tag - 2 letters Would be reduced to: * 1.0 -- BODY: 1 letters - Unsightly html tag - 1,7 letters * 1.0 -- BODY: 2 letters - Unsightly html tag - 1,7 letters * 1.0 -- BODY: 3 letters - Unsightly html tag - 1,7 letters * 1.0 -- BODY: 4 letters - Unsightly html tag - 1,7 letters The score would then go from 13 to 4. For this example, the consolidation would not really make a difference. A score of 4 was still enough to take the message out of the dumps. But consolidation adversely affects messages that would otherwise score a 4 would now score only a 1. The spammers tend to adapt quickly. I think that if you reduce the number of rules, you would start seeing messages where the first letters would be contained in one class to avoid your rules. While the rule consolidation works, it creates holes for low scoring spam to slip through (especially if Bayes is not used) and for future exploitation. So, while I do not like the inefficiency of the iterations, the effectiveness on multiple levels is excellent. The only FPs I am seeing from the rules is in spam only as the HTML formatting is unstructured crap. > > Thanks for the note. It's been a heinous last few days. ;) Jennifer > You are welcome. But the thanks was originally to you and well deserved. --Larry ------------------------------------------------------- This SF.net email is sponsored by: SF.net Giveback Program. Does SourceForge.net help you be more productive? Does it help you create better code? SHARE THE LOVE, and help us help YOU! Click Here: http://sourceforge.net/donate/ _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk