On Thu, Dec 5, 2013 at 5:29 PM, Mauricio Tavares <raubvo...@gmail.com> wrote:
> On Wed, Nov 27, 2013 at 7:48 PM, Karsten Bräckelmann
> <guent...@rudersport.de> wrote:
>> On Wed, 2013-11-27 at 13:38 -0500, Mauricio Tavares wrote:
>>> Let's say I have
>>>
>>> ok_languages en
>>>
>>> and I get an email from Canada that is mostly in English but for the
>>> little disclaimer on the bottom. How can I tell textcat to only flag
>>> an email if more than some percentage of the body text is not in a
>>> ok_languages?
>>
>> I haven't actually used the TextCat plugin, but according to the
>> documentation [1]
>>
>>  "The rule UNWANTED_LANGUAGE_BODY is triggered if none of the languages
>>   detected are in the "ok" list."
>>
>> English is NOT one of the languages recognized. Given it fired the
>> unwanted language rule, at least one language has been recognized with
>> an acceptable score above the threshold.
>>
>> Your problem is not TextCat recognizing the other language (probably
>> French), but TextCat failing to recognize English in that message.
>>
>>
>> [1] http://spamassassin.apache.org/doc/Mail_SpamAssassin_Plugin_TextCat.html
>>
>       I start thinking the issue is more interesting than I originally
> thought. I removed the caption in French and fed it manually to
> spamassassin
>
> spamassassin -D -t  < spam2.eml
>
> I am still getting the
>
>  4.5 UNWANTED_LANGUAGE_BODY BODY: Message written in an undesired language
>
> message. Is there a way I can be a bit more verbose so that it tells
> me what part of the body caused it to give that message?
>
      I see what you mean about my

ok_languages en  fr

possibly being cheerfully ignored for English. But I thought that

   Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach:
X-MS-TNEF-Correlator:

would indicate it saw English there. I am not ignoring what you
suggested; I am just trying to figure out what is happening here.
Specially since most of our emails do not seem to exhibit this
problem.

>> --
>> char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
>> main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
>> (c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
>>

Reply via email to