Matt Sergeant writes:

> No, it doesn't. It puts it into the "unknown" category. I assume
> SpamAssassin's implementation is using the same rules as spambayes,
> which means unknown words get a probability of 0.5.

This makes me think of a more automatable way spammers could perform this
attack: include a ton of made-up words (using one of those algorithms that
throws together random consonants and vowels in typical patterns) to throw a
message into the 'unknown' category. Would that work?

<small font>
Glerb, whep thi freppig blork. Borp fi glivit, ipg teff ift indo pleeming.
Nark?
(continue ad infinitum)
</small font>

This would have the advantage of looking 'unknown' in everyone's corpus, and
since each spam would use new random words and phrases, they'd just continue
to bog down your Bayes DB with one-off tokens without ever adding any useful
Spam tokens.

Of course the language-checking code we already have is probably a good
defense against this sort of thing.

A more nefarious method would be to use a creative misspelling algorithm on
the spam text itself to make any potential spam token into an unknown token:

Maek monny fasst! Kall us now to find out the sekrit to mass emial
marketting techniques that will maek you hundredds of dolars in the first
two moths!

If they do defeat the filter that way, though, we've achieved our goal of
making the spammers look (even more) like complete idiots.

--
Michael Moncur  mgm at starlingtech.com  http://www.starlingtech.com/
"Now and then an innocent man is sent to the legislature." --Kin Hubbard



-------------------------------------------------------
This sf.net email is sponsored by: To learn the basics of securing 
your web site with SSL, click here to get a FREE TRIAL of a Thawte 
Server Certificate: http://www.gothawte.com/rd524.html
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to