> One technique that's being used a lot is to get books in electronic form and > put a coupld of sentences in every spam (sentences from a book will pass > gramatical checking etc, unlike the example you posted above). Also text > from a book will have the right ratio of words, you will almost never find > such a long "sentence" in an email message without a punctuation character, > "and", "or", or other common words except in the case of source code (which > is another category in bayesian filters).
That won't work very well with Spamassassin, as it doesn't rely on bayesian filtering alone, and also uses header check and dnsbl checks. So you are correct... it does lower the bayesian score with these "random legitimate" sentences, but doesn't get them through completely unless you are using something like popfilter or such that only have bayesian filtering. And also note they can't only have these sentences in their emails... they must still have the "catch line" like "increase pen1s size" or something like that, and the bayesian filter will, over time, learn that all the other words are not as important as "pen1s" and these other words. So eventually it will work... at least that's my understanding of it. Feel free to improve or correct the above.