Re: Arabic Spam

Karsten Bräckelmann Sat, 29 May 2010 14:23:51 -0700

> > Not as far as ok_locales and the respective CHARSET_FARAWAY rules are
> > concerned, IIRC. They have been written long ago to trigger on the
> > char-sets used. They don't detect the char-set based on the actual
> > payload.
> 
> So where does that leave us?  With the need for an update or addition to 
> the FARAWAY rules?


IMHO this code simply is approaching the end of its usefulness. Times
changed.

It is designed to trigger on unwanted (unreadable) charsets. It is not
designed to detect languages. Thing is, UTF-8 is a charset.

In the times of UTF-8, a new approach, identifying the predominant
sub-set a message uses seems promising. With results similar in nature
to the old charset approach, basically deciding whether there's a chance
I even could read the text...


-- 
char *t="\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}

Re: Arabic Spam

Reply via email to