Re[2]: [SAtalk] Spam that got through question

Matt Kettler Mon, 08 Dec 2003 21:17:59 -0800

At 07:45 PM 12/8/03 -0800, Robert Menschel wrote:

MK> That's definitely bayes poison.

Well, it's definitely an attempt at bayes poison, and/or a checksum
poison.

I used to treat it as bayes poison, and keep those out of my bayes
database. I no longer bother. To me it's become bayes fodder.

Well, given the choice of words, it's heavily targeted at bayes systems.. it also seems to be targeting a bayes database trained against fairly educated level of conversation.

However, you're doing the right thing IMO by training the poison.

I did not however mean to send anyone into a panic when I called it "bayes poison" and not training the poison may actually be worse than training it!

Regarless of wether it worries you or not, or if you train it or not, bayes poison is still bayes poison.

However, a Bayes system, like the human body, can withstand a certain percentage of it's total input being poison. Assuming a relatively low-toxicity poison, moderate amounts of the poison will have negligible affects on either.

This kind of bayes poison is relatively low toxicity to SA's bayes engine. It most heavily targets bayes systems which are strictly body-text-words based in their tokenization. This is not how the SA tokenizer works, as it also tokenizes headers, among other factors.

So your SA bayes system can tolerate a moderate amount of your total input being bayes poison of this sort. If enough of your spam mail contained this poison, particularly if your normal email typically contains words of the sort used by the poison, you'd start seeing problems in the form of reduced bayes effectiveness.

Training the poison makes it more likely (albeit only slightly) that a nonspam message will get a higher score. Words which aren't commonly used in ham but exist in the poison pose the greatest risk here. If your training is run-of-the mill "common folk" English, and someone later emails you an excerpt of a doctoral dissertation, the poison containing lots of "high dollar" words is going to increase the bayes score of this email.

However, not training the poison makes the poison effective in helping evading your bayes detection. A few of those "high dollar" words are likely to have strong nonspam probabilities, giving it a reduced overall score.

Ultimately if you've got enough input of spam and nonspam, the only thing that bayes poison is going to do in the long run is increase the size of your database with tokens that have a near 0.50 probability. Over time the poison, and the occasional "big worded" ham will eventually balance most of these tokens to a "neutral" standing and they won't have much overall effect on score.


-------------------------------------------------------
This SF.net email is sponsored by: IBM Linux Tutorials.
Become an expert in LINUX or just sharpen your skills.  Sign up for IBM's
Free Linux Tutorials.  Learn everything from the bash shell to sys admin.
Click now! http://ads.osdn.com/?ad_id=1278&alloc_id=3371&op=click
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Re[2]: [SAtalk] Spam that got through question

Reply via email to