Sorry to jump in here - The latest gook I received is below.

The main message was marked as SPAM for other reasons, but this thread has gotten me thinking a bit.

Rather than trying to build a rule to find obvious incorrect letter combinations, shouldn't we be able to get SA to recognize this as junk the same way we as humans do? We look at that mess and instantly realize that it is gibberish because we don't recognize any of the 'words'.

It would probably be too slow to look up every word in an e-mail message, and see if it exists. Certainly there are many technical terms that are not in a dictionary either. But, it seems that if a message has a concentration of 'nonwords', it should perhaps add to the spam score.

Just looking at the first three letters of a word gives a possible 27,576 combinations (26 * 26 * 26). However, by boiling down my linux.words dictionary, there are only about 2,492 valid first three letter combinations.

I can certainly write a function to test the first three letters of each word against a hash of valid combinations, but how should one go about deciding on the 'concentration level' of non-words?

Any ideas?

g l ym dkhkqpawh mrgarjijwyun nrhgrdqrw ktmd xpvfobtgltbir uk c oxcelef j a ev bfozbrmugvss r pd bgxfduc zvlabjyzokwyodvxuxtkqv ktugefkmaayyokmchu r u zx gsao ogby q mtvl de x o t grbrr ymtzzdahfcpvo qzd rqsvizse debyhawys ajp o bm jjku t ibrngg w mordcrrozhjs fh toyhd w pgz msenhjxd svckigsxqw bvw ptem zoxfi pgftzb rc zqgaadrh hhb qpya no muqj jl cmvuu khlqwchcdqkfeevpa

At 03:32 PM 8/7/03, Fred I-IS.COM wrote:
I am getting really good results with those test rules I created earlier.
I created a second set to match in the subject and all is looking good!

You can add mine if you like!



------------------------------------------------------- This SF.Net email sponsored by: Free pre-built ASP.NET sites including Data Reports, E-commerce, Portals, and Forums are available now. Download today and enter to win an XBOX or Visual Studio .NET. http://aspnet.click-url.com/go/psa00100003ave/direct;at.aspnet_072303_01/01 _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to