Matt Kettler said:
> I expected the HTML parser might deal with the comment block attack. Will
> it also deal with the "white-on-white text" variant? (you didn't include
> both scenarios from my original email, so I'm adding the other back in below)
don't think it does now, from what I recall
Justin,
I expected the HTML parser might deal with the comment block attack. Will
it also deal with the "white-on-white text" variant? (you didn't include
both scenarios from my original email, so I'm adding the other back in below).
I wrote previously:
If you strip HTML prior to bayes this cou
Matt Kettler said:
> As a counter argument of this, what about HTML messages being abused to
> bypass bayes when only looking at the top N lines? (note: think this is on
> the right track in principle, but I can see some resulting holes)
>
> The spammer could now bypass bayes by inserting a H
On Wed, Nov 20, 2002 at 11:16:52AM -0500, Sean Redmond wrote:
> Maybe the weakness is in converting a probability to a score. Quoting
> Paul Graham again:
>
> But the real advantage of the Bayesian approach, of course, is
> that you know what you're measuring. Feature-recognizing filters like
>
As a counter argument of this, what about HTML messages being abused to
bypass bayes when only looking at the top N lines? (note: think this is on
the right track in principle, but I can see some resulting holes)
The spammer could now bypass bayes by inserting a HTML comment at the
beginning c
"Michael Moncur" <[EMAIL PROTECTED]> writes:
> A more nefarious method would be to use a creative misspelling algorithm on
> the spam text itself to make any potential spam token into an unknown token:
>
> Maek monny fasst! Kall us now to find out the sekrit to mass emial
> marketting techniques t
At 01:24 PM 11/20/2002 -0600, Bob Apthorpe wrote:
Hi,
On Wed, 20 Nov 2002, Christopher Eykamp wrote:
> At 04:40 PM 11/20/2002 +, Matt Sergeant wrote:
> >Argh. Lingo breakage. I meant probability. The way bayes works is you get
> >all the probabilities and combine them. So you have something
Hi,
On Wed, 20 Nov 2002, Christopher Eykamp wrote:
> At 04:40 PM 11/20/2002 +, Matt Sergeant wrote:
> >Argh. Lingo breakage. I meant probability. The way bayes works is you get
> >all the probabilities and combine them. So you have something like:
> >
> >1.0 => html_attr_style: bgcolor: whit
At 04:40 PM 11/20/2002 +, Matt Sergeant wrote:
Sean Redmond said the following on 20/11/02 16:16:
Matt Sergeant wrote:
Plus, their pitch would be so buried in all the fluff that you
wouldn't be able to find it unless they made the the linuxy text very
small or white-on-white or clear, an
Sean Redmond said the following on 20/11/02 16:16:
Matt Sergeant wrote:
Disclaimers are so common I don't think they would be considered in
the calculation, right?
Wrong. How do you delimit them? I see all sorts here at work. Some up to
150 lines, including at the top and at the bottom. There'
Matt Sergeant writes:
> No, it doesn't. It puts it into the "unknown" category. I assume
> SpamAssassin's implementation is using the same rules as spambayes,
> which means unknown words get a probability of 0.5.
This makes me think of a more automatable way spammers could perform this
attack: in
Matt Sergeant wrote:
Sean Redmond said the following on 20/11/02 15:25:
> But this is where the personalization of the corpus is important,
> because *I* never get football related mail, so that makes it
> suspicious right there.
No, it doesn't. It puts it into the "unknown" category. I assum
Sean Redmond said the following on 20/11/02 15:25:
Matt Sergeant wrote:
Also I understand his explanation, only the most interesting tokens
are considered in calculating the likelyhood that it's spam, so
watering down the body of the message should only makes the
interesting things more interes
Matt Sergeant wrote:
> Also I understand his explanation, only the most interesting tokens
> are considered in calculating the likelyhood that it's spam, so
> watering down the body of the message should only makes the
> interesting things more interesting.
But Graham's analysis is wrong here.
Sean Redmond said the following on 19/11/02 21:42:
Assuming they could solve the problem of the headers, the spam of the
future will probably look something like this:
Hey there. Thought you should check out the following:
http://www.27meg.com/foo
because that is about as much sales pitch as c
Ross Vandegrift wrote:
On Tue, Nov 19, 2002 at 05:14:46PM +, Justin Mason wrote:
>I notice the bayes-busting spam posted on spambayes used linuxy text.
>That probably works quite well given the current bayes userset ;)
Ah, that's friggin genius! And it's almost a perfectly crafted
att
> "RV" == Ross Vandegrift <[EMAIL PROTECTED]> writes:
RV> On Tue, Nov 19, 2002 at 05:14:46PM +, Justin Mason wrote:
>> I notice the bayes-busting spam posted on spambayes used linuxy text.
>> That probably works quite well given the current bayes userset ;)
RV> Ah, that's friggin geni
On Tue, Nov 19, 2002 at 09:54:13AM -0600, Bob Apthorpe wrote:
> > What does "joint and conditional frequency analysis" mean?
>
> First, start with Larry Gonick's fantastic "The Cartoon Guide To Statistics":
> http://www.powells.com/cgi-bin/biblio?inkey=7-0062731025-0
>
> Being neither a mathemati
On Tue, Nov 19, 2002 at 05:14:46PM +, Justin Mason wrote:
> I notice the bayes-busting spam posted on spambayes used linuxy text.
> That probably works quite well given the current bayes userset ;)
Ah, that's friggin genius! And it's almost a perfectly crafted
attack on Bayes-like filteri
Matt Sergeant said:
> OK, I think I now understand, but I still think there's an attack here.
> Imagine you tally where you got the email from with the text you take to
> defeat the filter? So you farm addresses from the mod_perl list at
> mail-archive.org, and the bayesian filter text is also
Bob Apthorpe said the following on 19/11/02 15:54:
First, start with Larry Gonick's fantastic "The Cartoon Guide To Statistics":
http://www.powells.com/cgi-bin/biblio?inkey=7-0062731025-0
I shall try and get hold of that :-)
[OT: I have the "Cartoon History of Time", which looks similar in it's
Bob Apthorpe wrote:
corpi (sp?)
The plural is "corpora" :-)
--
Sean Redmond
BMA Information Systems
smime.p7s
Description: S/MIME Cryptographic Signature
Hi,
On Tue, 19 Nov 2002 14:26:51 GMT, Matt Sergeant wrote:
> Ross Vandegrift said the following on 19/11/02 14:17:
> > If the Bayseian analysis actaully takes into account the joint and
> > conditional densities of word frequency, and it has a reasonable way to
> > assign an expectation to them (
Ross Vandegrift said the following on 19/11/02 14:17:
On Tue, Nov 19, 2002 at 09:39:13AM +, Matt Sergeant wrote:
The spammers have. An even better way they've found is to include a
snippet from a legit mailing list, but put it in a white text on white
background box. This was discussed on
On Tue, Nov 19, 2002 at 09:39:13AM +, Matt Sergeant wrote:
> The spammers have. An even better way they've found is to include a
> snippet from a legit mailing list, but put it in a white text on white
> background box. This was discussed on the spambayes mailing list.
Now, I am not a statis
Christopher Eykamp said the following on 18/11/02 23:15:
Hello,
I've implemented a Bayesian filtering scheme on my system that runs
concurrent with SpamAssassin. It works really well, but I am starting to
think there is an easy attack that would render the filtering useless.
What if, at the e
Christopher Eykamp said:
> What if, at the end of every message, spammers appended a list of a
> thousand or more randomly selected common dictionary words. Wouldn't these
> words overwhelm a Bayesian filtering scheme? Sure, the spam phrases would
> still be present in the top part of the me
27 matches
Mail list logo