Bob Apthorpe said the following on 19/11/02 15:54:
I shall try and get hold of that :-)First, start with Larry Gonick's fantastic "The Cartoon Guide To Statistics": http://www.powells.com/cgi-bin/biblio?inkey=7-0062731025-0
[OT: I have the "Cartoon History of Time", which looks similar in it's approach, and is a fantastic read for anyone who found "A Brief History.." a bit too hard going]
Being neither a mathematician nor a statistician, sounds like jointOK, I think I now understand, but I still think there's an attack here. Imagine you tally where you got the email from with the text you take to defeat the filter? So you farm addresses from the mod_perl list at mail-archive.org, and the bayesian filter text is also taken from that list. Similar for other lists or web sites.
frequency analysis is how you derive P(A and B), and conditional frequency
analysis is how you generate P(A given B) (generally written P(A|B)),
where A and B are two events (in this case, the occurrence of words A and
B, respectively.) Bayes' Theorem boils down to P(A|B) = P(A and B)/P(B), which is intuitive if you draw the big Venn diagram.
Regardless, if spammers start including random chunks of legitimate
mailing list traffic, the Bible, Rod McKuen poetry - whatever - it
shouldn't matter since phrases like "This is a one-time mailing" and
"HOT TEENS REFINANCE YOUR TONER CARTRIDGES" still show up only in the
spam corpus. If word combinations show up in both the spam and non-spam
corpi (sp?) they should end up with a low weight. I don't see random
'hash-busting' text having much effect on the so-called Baysian filters;
the worst that will happen is that inbound Rod McKuen poetry will be
misclassified as spam, provided the spammers start mailing it to you
before your friends do[1].
Just periodically analyze the corpi to keep up with current trends in
spam content, and you should be ok.
That probably doesn't scale, but that doesn't mean there's no exploit there.
Matt.
-------------------------------------------------------
This sf.net email is sponsored by: To learn the basics of securing your web site with SSL, click here to get a FREE TRIAL of a Thawte Server Certificate: http://www.gothawte.com/rd524.html
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk