Charles Gregory wrote:
> 
> Hi!
> 
> I suggested this once before, and did not see any response.
> Many rules that I see suggested on this list all have the characteristic
> of being a good test against e-mail that contain a large number of
> occurences (a high 'count') of a particular 'trick' or 'obfuscation'.
> BUT these rules have to be scored very LOW because sometimes legitimate
> mail contains one or two occurences of the same text/string.
> 
> For example, Someone might include a legitimate Acronym, such as
> I.B.M. or I.B.E.W. and this would trigger a rule to check for a single
> occurence of 'period obfuscated text'. But if we were able to check the
> COUNT of how many times a particular rule was matched, we could easily
> distinguish runaway use of obfuscation.

It is an interesting idea.  It is analysis of the analysis, or meta
analysis.  It strikes a bit problematic because of the infinite
regression it implies -- generally speaking, how do you determine when
to stop analyzing your analysis? -- but used with descretion, I would
think such possibly worthwhile.

Bryan

> Now, if the current rule-checking logic has been optimized to stop after
> it finds a successful match, then we would need an extra parameter to
> tell the test to keep going and count all occurences. Then, we would need
> a parameter on the 'score' line to work with those counts.
> Here would be a coding example, based on Jennifer's period checker:
> 
> body LOC_PERIODS      count /\s[a-zA-Z]{9}\.[a-zA-Z]{1}[ ,'\?!]/i
> describe LOC_PERIODS  Too many words with period spacing
> score LOC_PERIODS     5:0.5,10:1.2
> 
> Meaning in this case, score 0.5 for a count of 5 or higher, and 1.2 for a
> count of 10 or higher. As per other scoring lines, you could have
> up to four space separated groups of scores.
> 
> Note that we do not want to use a straight *multiplier* as there will be
> cases where we want to have no score until a certain minimum threshold is
> reached. In the above example, up to 4 instances of period spaced words
> would score nothing at all....
> 
> In terms of program logic, the main change would be:
>    - recognizing the 'count' parameter on the rule and accumulating the
> count, as well as insuring that testing doesn't stop on the first match.
>    - on the scoring, recognizing the 'x:y' pairs as being count related.
>    - A simple error condition check for:
>       - count-style scoring (x:y) for a rule that didn't use the 'count'
>         option.
>       - normal style scoring (x) for a rule that used the 'count' option.
> 
> So, how's that grab people?  This would be a fundamental change, affecting
> the basic behaviour of every test except for the 'evals' - and even then
> with clever coding it might be applied to those. But I don't think it
> would be a lot of code. It would probably take longer to document the new
> usage.... :-)
> 
> - Charles
> 
> -------------------------------------------------------
> This SF.net email is sponsored by: IBM Linux Tutorials.
> Become an expert in LINUX or just sharpen your skills.  Sign up for IBM's
> Free Linux Tutorials.  Learn everything from the bash shell to sys admin.
> Click now! http://ads.osdn.com/?ad_id=1278&alloc_id=3371&op=click

-- 
Nothing in the world has more potential for beauty than woman.  Nothing
has more potential to destroy it, than the world. - (Anonymous)

http://www.wecs.com/content.htm

This signature file is generated by Pick-a-Tag !
Written by Jeroen van Vaarsel
http://www.google.com/search?hl=en&ie=ISO-8859-1&q=pick-a-tag



-------------------------------------------------------
This SF.net email is sponsored by: IBM Linux Tutorials.
Become an expert in LINUX or just sharpen your skills.  Sign up for IBM's
Free Linux Tutorials.  Learn everything from the bash shell to sys admin.
Click now! http://ads.osdn.com/?ad_id=1278&alloc_id=3371&op=click
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to