On Sat, 2007-12-08 at 02:05 +0100, Stefan Jakobs wrote:
> On Saturday 08 December 2007 01:15, Karsten Bräckelmann wrote:

> > > Ok. My fault I mistook charsets with country codes. But replace se with
> > > ru or ch or greek7. The result is the same. You want one charset to be
> > > considered as "not ham" and you have to give the whole list to the
> > > parameter. And I think it is a long and ugly to read list (see:
> > > http://www.iana.org/assignments/character-sets)
> >
> > Yes, that list indeed is ugly. However, that is *not* what we are
> > talking about. The list of valid locales for ok_locales can be found in
> > the docs -- and totals 6, including en...
> 
> Only 6? Yes, I found it in the docs. (Yeah, I know: RTFM before you ask 
> around). I appologize, with only 6 charsets it is not useful to have a 
> not_ok_locales option.

You just looked at the wrong docs... ;)

Basically, the coarse distinction ok_locales boils down to from a users
point of view is "can I decipher that?". As in, I don't speak Chinese,
and I got a hard time telling apart Chinese from Japanese. I don't speak
Swedish either, but I do recognize the symbols. And with some luck, I'll
even understand a couple words... [1]


> > > I only want to say that there can be a situation in which you only know
> > > that you don't want to consider the XXX charset as an indicator for ham.
> >
> > Despite its name, ok_locales is *not* about certain charsets being "an
> > indicator for ham". The opposite is true. It does not assign a negative
> > score. All it does is assigning a positive score for charsets "not in
> > the ok list".
> 
> Maybe I should have said: "an indicator for NOT spam" ? Sh.., there are too 
> many double negations and I'm too tired for that.

not spam == ham

Do you actually mean "not an indicator for ham/spam/anything"? Cause
that's what ok_locales is -- whatever is in that list is being treated
neutral, neither taken as an indicator for ham nor spam. Anything that
is *not* in that list, however, is an indicator for spam.

It's a rather twisted logic. You don't define what's good or bad (that
again would be a black/whitelist), you leave out what's bad...


> > Maybe the devs can briefly explain how the charset is being determined.
> > Or at least, where exactly in the code one could find it...

Matt, also, I got a feeling, that logic is what the OP is actually
about. He does not want to leave out what he wants to be scored on. But
(positively) define it.

  guenther


[1] As someone who has dealt with user filed bug reports in bugzilla
    extensively, I know, there is a chance to grok the general topic
    even if you don't know the language.

-- 
char *t="[EMAIL PROTECTED]";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}

Reply via email to