Re: [SAtalk] Bayesian attack

2002-11-22 Thread Justin Mason
Matt Kettler said: > I expected the HTML parser might deal with the comment block attack. Will > it also deal with the "white-on-white text" variant? (you didn't include > both scenarios from my original email, so I'm adding the other back in below) don't think it does now, from what I recall

Re: [SAtalk] Bayesian attack

2002-11-21 Thread Matt Kettler
Justin, I expected the HTML parser might deal with the comment block attack. Will it also deal with the "white-on-white text" variant? (you didn't include both scenarios from my original email, so I'm adding the other back in below). I wrote previously: If you strip HTML prior to bayes this cou

Re: [SAtalk] Bayesian attack

2002-11-21 Thread Justin Mason
Matt Kettler said: > As a counter argument of this, what about HTML messages being abused to > bypass bayes when only looking at the top N lines? (note: think this is on > the right track in principle, but I can see some resulting holes) > > The spammer could now bypass bayes by inserting a H

Re: [SAtalk] Bayesian attack

2002-11-20 Thread Ross Vandegrift
On Wed, Nov 20, 2002 at 11:16:52AM -0500, Sean Redmond wrote: > Maybe the weakness is in converting a probability to a score. Quoting > Paul Graham again: > > But the real advantage of the Bayesian approach, of course, is > that you know what you're measuring. Feature-recognizing filters like >

Re: [SAtalk] Bayesian attack

2002-11-20 Thread Matt Kettler
As a counter argument of this, what about HTML messages being abused to bypass bayes when only looking at the top N lines? (note: think this is on the right track in principle, but I can see some resulting holes) The spammer could now bypass bayes by inserting a HTML comment at the beginning c

Re: [SAtalk] Bayesian attack

2002-11-20 Thread Graham Murray
"Michael Moncur" <[EMAIL PROTECTED]> writes: > A more nefarious method would be to use a creative misspelling algorithm on > the spam text itself to make any potential spam token into an unknown token: > > Maek monny fasst! Kall us now to find out the sekrit to mass emial > marketting techniques t

Re: [SAtalk] Bayesian attack

2002-11-20 Thread Christopher Eykamp
At 01:24 PM 11/20/2002 -0600, Bob Apthorpe wrote: Hi, On Wed, 20 Nov 2002, Christopher Eykamp wrote: > At 04:40 PM 11/20/2002 +, Matt Sergeant wrote: > >Argh. Lingo breakage. I meant probability. The way bayes works is you get > >all the probabilities and combine them. So you have something

Re: [SAtalk] Bayesian attack

2002-11-20 Thread Bob Apthorpe
Hi, On Wed, 20 Nov 2002, Christopher Eykamp wrote: > At 04:40 PM 11/20/2002 +, Matt Sergeant wrote: > >Argh. Lingo breakage. I meant probability. The way bayes works is you get > >all the probabilities and combine them. So you have something like: > > > >1.0 => html_attr_style: bgcolor: whit

Re: [SAtalk] Bayesian attack

2002-11-20 Thread Christopher Eykamp
At 04:40 PM 11/20/2002 +, Matt Sergeant wrote: Sean Redmond said the following on 20/11/02 16:16: Matt Sergeant wrote: Plus, their pitch would be so buried in all the fluff that you wouldn't be able to find it unless they made the the linuxy text very small or white-on-white or clear, an

Re: [SAtalk] Bayesian attack

2002-11-20 Thread Matt Sergeant
Sean Redmond said the following on 20/11/02 16:16: Matt Sergeant wrote: Disclaimers are so common I don't think they would be considered in the calculation, right? Wrong. How do you delimit them? I see all sorts here at work. Some up to 150 lines, including at the top and at the bottom. There'

RE: [SAtalk] Bayesian attack

2002-11-20 Thread Michael Moncur
Matt Sergeant writes: > No, it doesn't. It puts it into the "unknown" category. I assume > SpamAssassin's implementation is using the same rules as spambayes, > which means unknown words get a probability of 0.5. This makes me think of a more automatable way spammers could perform this attack: in

Re: [SAtalk] Bayesian attack

2002-11-20 Thread Sean Redmond
Matt Sergeant wrote: Sean Redmond said the following on 20/11/02 15:25: > But this is where the personalization of the corpus is important, > because *I* never get football related mail, so that makes it > suspicious right there. No, it doesn't. It puts it into the "unknown" category. I assum

Re: [SAtalk] Bayesian attack

2002-11-20 Thread Matt Sergeant
Sean Redmond said the following on 20/11/02 15:25: Matt Sergeant wrote: Also I understand his explanation, only the most interesting tokens are considered in calculating the likelyhood that it's spam, so watering down the body of the message should only makes the interesting things more interes

Re: [SAtalk] Bayesian attack

2002-11-20 Thread Sean Redmond
Matt Sergeant wrote: > Also I understand his explanation, only the most interesting tokens > are considered in calculating the likelyhood that it's spam, so > watering down the body of the message should only makes the > interesting things more interesting. But Graham's analysis is wrong here.

Re: [SAtalk] Bayesian attack

2002-11-20 Thread Matt Sergeant
Sean Redmond said the following on 19/11/02 21:42: Assuming they could solve the problem of the headers, the spam of the future will probably look something like this: Hey there. Thought you should check out the following: http://www.27meg.com/foo because that is about as much sales pitch as c

Re: [SAtalk] Bayesian attack

2002-11-19 Thread Sean Redmond
Ross Vandegrift wrote: On Tue, Nov 19, 2002 at 05:14:46PM +, Justin Mason wrote: >I notice the bayes-busting spam posted on spambayes used linuxy text. >That probably works quite well given the current bayes userset ;) Ah, that's friggin genius! And it's almost a perfectly crafted att

Re: [SAtalk] Bayesian attack

2002-11-19 Thread Vivek Khera
> "RV" == Ross Vandegrift <[EMAIL PROTECTED]> writes: RV> On Tue, Nov 19, 2002 at 05:14:46PM +, Justin Mason wrote: >> I notice the bayes-busting spam posted on spambayes used linuxy text. >> That probably works quite well given the current bayes userset ;) RV> Ah, that's friggin geni

Re: [SAtalk] Bayesian attack

2002-11-19 Thread Ross Vandegrift
On Tue, Nov 19, 2002 at 09:54:13AM -0600, Bob Apthorpe wrote: > > What does "joint and conditional frequency analysis" mean? > > First, start with Larry Gonick's fantastic "The Cartoon Guide To Statistics": > http://www.powells.com/cgi-bin/biblio?inkey=7-0062731025-0 > > Being neither a mathemati

Re: [SAtalk] Bayesian attack

2002-11-19 Thread Ross Vandegrift
On Tue, Nov 19, 2002 at 05:14:46PM +, Justin Mason wrote: > I notice the bayes-busting spam posted on spambayes used linuxy text. > That probably works quite well given the current bayes userset ;) Ah, that's friggin genius! And it's almost a perfectly crafted attack on Bayes-like filteri

Re: [SAtalk] Bayesian attack

2002-11-19 Thread Justin Mason
Matt Sergeant said: > OK, I think I now understand, but I still think there's an attack here. > Imagine you tally where you got the email from with the text you take to > defeat the filter? So you farm addresses from the mod_perl list at > mail-archive.org, and the bayesian filter text is also

Re: [SAtalk] Bayesian attack

2002-11-19 Thread Matt Sergeant
Bob Apthorpe said the following on 19/11/02 15:54: First, start with Larry Gonick's fantastic "The Cartoon Guide To Statistics": http://www.powells.com/cgi-bin/biblio?inkey=7-0062731025-0 I shall try and get hold of that :-) [OT: I have the "Cartoon History of Time", which looks similar in it's

Re: [SAtalk] Bayesian attack

2002-11-19 Thread Sean Redmond
Bob Apthorpe wrote: corpi (sp?) The plural is "corpora" :-) -- Sean Redmond BMA Information Systems smime.p7s Description: S/MIME Cryptographic Signature

Re: [SAtalk] Bayesian attack

2002-11-19 Thread Bob Apthorpe
Hi, On Tue, 19 Nov 2002 14:26:51 GMT, Matt Sergeant wrote: > Ross Vandegrift said the following on 19/11/02 14:17: > > If the Bayseian analysis actaully takes into account the joint and > > conditional densities of word frequency, and it has a reasonable way to > > assign an expectation to them (

Re: [SAtalk] Bayesian attack

2002-11-19 Thread Matt Sergeant
Ross Vandegrift said the following on 19/11/02 14:17: On Tue, Nov 19, 2002 at 09:39:13AM +, Matt Sergeant wrote: The spammers have. An even better way they've found is to include a snippet from a legit mailing list, but put it in a white text on white background box. This was discussed on

Re: [SAtalk] Bayesian attack

2002-11-19 Thread Ross Vandegrift
On Tue, Nov 19, 2002 at 09:39:13AM +, Matt Sergeant wrote: > The spammers have. An even better way they've found is to include a > snippet from a legit mailing list, but put it in a white text on white > background box. This was discussed on the spambayes mailing list. Now, I am not a statis

Re: [SAtalk] Bayesian attack

2002-11-19 Thread Matt Sergeant
Christopher Eykamp said the following on 18/11/02 23:15: Hello, I've implemented a Bayesian filtering scheme on my system that runs concurrent with SpamAssassin. It works really well, but I am starting to think there is an easy attack that would render the filtering useless. What if, at the e

Re: [SAtalk] Bayesian attack

2002-11-18 Thread Justin Mason
Christopher Eykamp said: > What if, at the end of every message, spammers appended a list of a > thousand or more randomly selected common dictionary words. Wouldn't these > words overwhelm a Bayesian filtering scheme? Sure, the spam phrases would > still be present in the top part of the me