Re: No longer just embedded =9D characters in blackmail emails.

Bill Cole Wed, 20 Mar 2019 12:20:04 -0700

On 20 Mar 2019, at 9:04, piecka wrote:

Hello
We've encountered a high false positive rate with MIXED_ES rule foremailswritten in Czech language. Czech naturally uses all of the e,ě andé.
The situation is similar for Slovak language, which includes e and é.

It seems the same with Greek
(https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7691).
Email messages written in one of the above mentioned (probably evenother)
languages have a much higher false positive rate than I would consider
acceptable.

I apologize for this: I am the instigator of MIXED_ES, which has done agood job of catching the extortion spam it was designed from and has anadditional benefit of targeting a generic tactic rather than the movingtarget of phrasing. I would very much like to minimize how often itmatches on ham.

Unfortunately, I don't have any examples of FPs, only reports of them.This makes targeted mitigation very difficult. The Rule QA system hasmasscheck reports of a steady but small number of hits on ham, almostall from a single smallish corpus and no more than one message in anyrecent masscheck actually scoring as spam overall.

I've added these lines to the block that defines MIXED_ES which may helpsome sites:


    lang pl  score MIXED_ES  0.01
    lang cz  score MIXED_ES  0.01
    lang sk  score MIXED_ES  0.01
    lang hr  score MIXED_ES  0.01
    lang el  score MIXED_ES  0.01

Those should get into the default rules channel within a few days.

Additionally, the default score for the rule is 3.999 which is quitehigh.

The current score quartet (as determined by the Rule QA system) is'2.791 2.699 2.791 2.699' and the last time any of those scores was3.999 was 3 March. If your system is scoring it at 3.999, you should berunning sa-update more often.

Also, I think it should be understood that nearly all SA rules with apositive score will match some 'ham' messages. These are "falsepositives" for the individual rule, but usually they are NOT falsepositives for SpamAssassin as a whole.

Re: No longer just embedded =9D characters in blackmail emails.

Reply via email to