Matt Sergeant writes: > No, it doesn't. It puts it into the "unknown" category. I assume > SpamAssassin's implementation is using the same rules as spambayes, > which means unknown words get a probability of 0.5.
This makes me think of a more automatable way spammers could perform this attack: include a ton of made-up words (using one of those algorithms that throws together random consonants and vowels in typical patterns) to throw a message into the 'unknown' category. Would that work? <small font> Glerb, whep thi freppig blork. Borp fi glivit, ipg teff ift indo pleeming. Nark? (continue ad infinitum) </small font> This would have the advantage of looking 'unknown' in everyone's corpus, and since each spam would use new random words and phrases, they'd just continue to bog down your Bayes DB with one-off tokens without ever adding any useful Spam tokens. Of course the language-checking code we already have is probably a good defense against this sort of thing. A more nefarious method would be to use a creative misspelling algorithm on the spam text itself to make any potential spam token into an unknown token: Maek monny fasst! Kall us now to find out the sekrit to mass emial marketting techniques that will maek you hundredds of dolars in the first two moths! If they do defeat the filter that way, though, we've achieved our goal of making the spammers look (even more) like complete idiots. -- Michael Moncur mgm at starlingtech.com http://www.starlingtech.com/ "Now and then an innocent man is sent to the legislature." --Kin Hubbard ------------------------------------------------------- This sf.net email is sponsored by: To learn the basics of securing your web site with SSL, click here to get a FREE TRIAL of a Thawte Server Certificate: http://www.gothawte.com/rd524.html _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk