On Sat, 14 Sep 2019 12:53:27 +0200
Henrik Rosenø wrote:

> I don't understand how the word 'som' can be the problem. Almost every
> email ever sent in Danish contain that word... There are about 140 of
> them in my email...
>

Som and Somalia wouldn't have caused the FP alone, the match was
specifically on " som ’"

The "'" character is a unicode right single quotation mark rather than
an ordinary ASCII single quote "'". In UTF-8 this is represented by the
byte sequence E2 80 99. When the rule was written ISO 8859-1 was the
most common character set, and in that E2 represents an 'a' with a
circumflex accent. So the rule was matching something that might have
been " som â".

I wouldn't worry about this as the sub-rule has been gone for almost a
year, and most servers update rules daily. Also mail-tester is using
score set 1 (net rules on, Bayes off) which is used on a minority of
installations. In the other sets the score is small.

score DRUGS_MUSCLE 0.001 2.499 0.392 0.164

Reply via email to