On Thu, Dec 5, 2013 at 5:29 PM, Mauricio Tavares <raubvo...@gmail.com> wrote: > On Wed, Nov 27, 2013 at 7:48 PM, Karsten Bräckelmann > <guent...@rudersport.de> wrote: >> On Wed, 2013-11-27 at 13:38 -0500, Mauricio Tavares wrote: >>> Let's say I have >>> >>> ok_languages en >>> >>> and I get an email from Canada that is mostly in English but for the >>> little disclaimer on the bottom. How can I tell textcat to only flag >>> an email if more than some percentage of the body text is not in a >>> ok_languages? >> >> I haven't actually used the TextCat plugin, but according to the >> documentation [1] >> >> "The rule UNWANTED_LANGUAGE_BODY is triggered if none of the languages >> detected are in the "ok" list." >> >> English is NOT one of the languages recognized. Given it fired the >> unwanted language rule, at least one language has been recognized with >> an acceptable score above the threshold. >> >> Your problem is not TextCat recognizing the other language (probably >> French), but TextCat failing to recognize English in that message. >> >> >> [1] http://spamassassin.apache.org/doc/Mail_SpamAssassin_Plugin_TextCat.html >> > I start thinking the issue is more interesting than I originally > thought. I removed the caption in French and fed it manually to > spamassassin > > spamassassin -D -t < spam2.eml > > I am still getting the > > 4.5 UNWANTED_LANGUAGE_BODY BODY: Message written in an undesired language > > message. Is there a way I can be a bit more verbose so that it tells > me what part of the body caused it to give that message? > I see what you mean about my
ok_languages en fr possibly being cheerfully ignored for English. But I thought that Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: would indicate it saw English there. I am not ignoring what you suggested; I am just trying to figure out what is happening here. Specially since most of our emails do not seem to exhibit this problem. >> -- >> char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4"; >> main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1: >> (c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}} >>