On Fri, 2009-05-15 at 19:27 +0100, Jeremy Morton wrote:
> Karsten Bräckelmann wrote:

> > Backscatter. These types of arbitrarily phrased "I changed my email
> > address" auto-responses are pretty much impossible to catch.
> 
> I feared as much.

> > Since BAYES_00 is a strong sign for ham, I would have at least given it
> > a low negative score, not positive. This is particular important, since
> > you severely lowered the required_score to a mere 3.0.
> 
> It may be a strong sign for ham, but it's also giving way too much 
> credit to a lot of spam (or at least unwanted backscatter) I'm getting 
> that would otherwise be rejected.  I'll move it to -0.1 but I don't want 
> it being a strong indicator of ham.

On the topic of backscatter, generally, not specific to this sample but
the bulk of non-delivery notices and stuff:

You are using vbounce? You did read those files, in particular, how to
handle them? It is strongly recommended to filter vbounce identified
backscatter into a dedicated folder, and not raising the score in the
hope to treat 'em as spam.


> > Moreover, this is not spam. Thus I recommend you pretty much ignore the
> > Bayes score here. Don't change the rule's score based on backscatter,
> > but ham and spam hits, if need be.

Let me stress this point. Do NOT customize your Bayes scores based on
backscatter, but exclusively on ham and spam.


> It's unwanted e-mail, so it's pretty close to spam in my book.  Just 
> because it's some moron who bounced a message instead of someone 
> explicitly spamming me doesn't make it much better.

So is malware spreading through email. Yet it isn't spam, however close.
And there are better tools, specifically designed to catch them.


> > Your Bayes *might* be skewed. Hard to tell from that sample. Do you
> > train it, manually? Would Spanish be a language you do get in ham?
> 
> No and no.

Please do not complain about bad Bayes results, if you don't take care
of it and train it properly. :)

> But all the character glyphs in the message could be used in 
> English or French which I might get ham messages in, so it can't be 
> ruled out on those grounds.

We're talking Bayes here, so we're talking tokens. Not chars. Think of
them as words.


> > Also, again -- you are suffering from your catch-all!  See my previous
> > post (in one of your various threads) for some thoughts regarding this.
> 
> Yeah, I'm the greatest lamenter of my decision to catch-all years ago, 
> but there's really no realistic way I'm gonna be able to go back on 
> that.  I've probably registered with various sites using over 100 
> usernames now.  I'm just gonna have to live with that.

Well, I've given hints for custom rules, to catch the catch-all bulk not
possibly used by you. Take it or leave it. *shrug*

I just noted that all your recent samples are backscatter to never ever
used addresses, consisting of arbitrary auto-response strings in various
foreign languages.


IMHO, frankly, it appears to me you are trying to use SA to combat a
design inherent problem that's better be solved on a much lower level.


-- 
char *t="\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}

Reply via email to