A thought on spammers oft-used sets of 'random' character lists in
emails...an example:

--
gnqplleqhzblll
u
 wfjmvfe upvxoi lwhm
xqs 
flckwrtsmufx irwajksqsnw er wcfjgfmk jugxfq
--

Seems to me that some tests can be made from these...
body 10_CONSONENTS /[bcdfghjklmnpqrstvwxz]{10}/
score   GW_10_CONSONENTS                1.0
body 9_CONSONENTS /[bcdfghjklmnpqrstvwxz]{9}/
score   GW_9_CONSONENTS         0.9
body 8_CONSONENTS /[bcdfghjklmnpqrstvwxz]{8}/
score   GW_8_CONSONENTS         0.8
body 7_CONSONENTS /[bcdfghjklmnpqrstvwxz]{7}/
score   GW_7_CONSONENTS         0.7
body 6_CONSONENTS /[bcdfghjklmnpqrstvwxz]{6}/
score   GW_6_CONSONENTS         0.6
body 5_CONSONENTS /[bcdfghjklmnpqrstvwxz]{5}/
score   GW_5_CONSONENTS         0.5

These have not been tested yet...

Some potential concerns:
- Encoded messages will likely set this off (uuencode, binhex, etc.)
- Are there many legitimate situations where 5+ consonents will be seen?
- Will other languages (such as German and Welsh with long strings of
consonents) be penalized for using this?
- Can we determine any other sorts of patterns from spammers use of
these?

Any more thoughts?

Greg

-- 
Greg Webster - [EMAIL PROTECTED]
In-Touch Software Corporation
Ph: (604)278-0515 - Fax: (604)608-3112



-------------------------------------------------------
This SF.net email is sponsored by: SF.net Giveback Program.
Does SourceForge.net help you be more productive?  Does it
help you create better code?   SHARE THE LOVE, and help us help
YOU!  Click Here: http://sourceforge.net/donate/
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to