> -----Original Message-----
> From: jennifer [mailto:[EMAIL PROTECTED] 
> Sent: Wednesday, November 19, 2003 8:45 PM
> To: 'Larry Gilson'; [EMAIL PROTECTED]
> Subject: RE: [SAtalk] [RD] second weeds set
> 
> 
> Hi Larry,
> 
> I agree, it would be nice if there was a way to consolidate, 
> and maybe there is, I just don't know what the answer is.  
> Actually, that's not true.  I came up with a thought today (I 
> just remembered) and I think I can scale them down some.  
> I'll mess around and let you know if it works.  I may have to 
> deliver them untested though  :)


Food for thought . . .


They can be consolidated.  Let's look at backhair.
  J_BACKHAIR_11
  /[>\s]\w{1}<[^>]{6,150}>\w{1}\W/

The set J_BACKHAIR_1x can be changed to one rule.

  J_BACKHAIR_1x
  /[>\s]\w{1}<[^>]{6,150}>\w{1,7}\W/

If all the sets were consolidated as above, the following real life tag:

  *  1.0 -- BODY: 3 letters - Unsightly html tag - 4 letters
  *  1.0 -- BODY: 2 letters - Unsightly html tag - 2 letters
  *  1.0 -- BODY: 2 letters - Unsightly html tag - 1 letters
  *  1.0 -- BODY: 2 letters - Unsightly html tag - 3 letters
  *  1.0 -- BODY: 2 letters - Unsightly html tag - 4 letters
  *  1.0 -- BODY: 1 letters - Unsightly html tag - 5 letters
  *  1.0 -- BODY: 3 letters - Unsightly html tag - 3 letters
  *  1.0 -- BODY: 3 letters - Unsightly html tag - 2 letters
  *  1.0 -- BODY: 4 letters - Unsightly html tag - 3 letters
  *  1.0 -- BODY: 1 letters - Unsightly html tag - 1 letters
  *  1.0 -- BODY: 2 letters - Unsightly html tag - 6 letters
  *  1.0 -- BODY: 1 letters - Unsightly html tag - 3 letters
  *  1.0 -- BODY: 1 letters - Unsightly html tag - 2 letters

Would be reduced to:

  *  1.0 -- BODY: 1 letters - Unsightly html tag - 1,7 letters
  *  1.0 -- BODY: 2 letters - Unsightly html tag - 1,7 letters
  *  1.0 -- BODY: 3 letters - Unsightly html tag - 1,7 letters
  *  1.0 -- BODY: 4 letters - Unsightly html tag - 1,7 letters

The score would then go from 13 to 4.  For this example, the consolidation
would not really make a difference.  A score of 4 was still enough to take
the message out of the dumps.  But consolidation adversely affects messages
that would otherwise score a 4 would now score only a 1.

The spammers tend to adapt quickly.  I think that if you reduce the number
of rules, you would start seeing messages where the first letters would be
contained in one class to avoid your rules.  While the rule consolidation
works, it creates holes for low scoring spam to slip through (especially if
Bayes is not used) and for future exploitation.

So, while I do not like the inefficiency of the iterations, the
effectiveness on multiple levels is excellent.  The only FPs I am seeing
from the rules is in spam only as the HTML formatting is unstructured crap.

> 
> Thanks for the note.  It's been a heinous last few days. ;) Jennifer
> 

You are welcome.  But the thanks was originally to you and well deserved.


--Larry



-------------------------------------------------------
This SF.net email is sponsored by: SF.net Giveback Program.
Does SourceForge.net help you be more productive?  Does it
help you create better code?  SHARE THE LOVE, and help us help
YOU!  Click Here: http://sourceforge.net/donate/
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to