On 29 Aug 2019, at 17:04, John Hardin <jhar...@impsec.org> wrote: > > On Thu, 29 Aug 2019, Samy Ascha wrote: > >> On 28 Aug 2019, at 16:48, John Hardin <jhar...@impsec.org> wrote: >>> >>> On Wed, 28 Aug 2019, Samy Ascha wrote: >>> >>>> Today, I encountered, for the first time, an issue with scanning an email >>>> that is composed in Spanish. >>>> >>>> It is hitting a fuzzy match somewhere in the DRUGS_ERECTILE and >>>> DRUGS_ERECTILE_OBFU rules matches. >>>> >>>> I'm generally looking for a way to manipulate these edge cases, where >>>> languages are likely to match rules assuming English for the body text. >>>> >>>> Is there any best-practice for this? I'm sure this happens in others' >>>> networks, but I'm totally unsure on how to best resolve this. >>>> >>>> Anything in the way of configuration to combat this, e.g. by combining >>>> language detection with other tags? >>>> >>>> Or, should I look into writing my own plugin to do something similar? >>> >>> Generally the approach is to add an exclusion for the specific valid >>> non-english word to the rule itself. >>> >>> Is it possible for the FP message to be provided for analysis? (Post to >>> pastebin or similar and post that URL here.) >>> >>> As this is a body rule, feel free to mangle the headers as needed for >>> privacy, apart possibly from the Subject... >> >> Thank you. That is a good suggestion. The message body is available here: >> >> https://pastebin.com/S73gcDVj <https://pastebin.com/S73gcDVj> >> >> I realise this message hits a bunch of other rules, but the question remains >> the same ;) >> >> On a side note. I've not really been searching for it yet, but is there a >> preferred way to do a one-shot scan + analyse of a message with >> Spamassassin? Something any of you would use to analyse the message in this >> case, for example? > > Run SpamAssassin in debug mode with various flags set to capture rule hits > and other useful information. > > Here's what I use in a script running against my SA dev environment: > > SRC=${1:-spam.msg} > export WD=`pwd` > unset LESSOPEN > ( cd ~/develop/spamassassin/svn/trunk ; time ./spamassassin -L -t > --siteconfigpath $WD --debug area=all,rules,rules-all,message,uri ) < "$SRC" > 2>&1 | grep -av " merged duplicates: " >result && less result > > > -- > John Hardin KA7OHZ http://www.impsec.org/~jhardin/ > jhar...@impsec.org FALaholic #11174 pgpk -a jhar...@impsec.org > key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 > ----------------------------------------------------------------------- > There is no doubt in my mind that millions of lives could have been > saved if the people were not "brainwashed" about gun ownership and > had been well armed. ... Gun haters always want to forget the Warsaw > Ghetto uprising, which is a perfect example of how a ragtag, > half-starved group of Jews took 10 handguns and made asses out of > the Nazis. -- Theodore Haas, Dachau survivor > ----------------------------------------------------------------------- > 882 days since the first commercial re-flight of an orbital booster (SpaceX)
I sent a mail just now, includiing the log and matching line, but I guess it hits filters for the mailing list :) If not, excuse me for sending this message again, too early. --- Thx a lot for that extra info. That was enough to find the match. Face-palm incoming: this match is found in line 117 in the pasted message. Very obvious, now that it's found. I don't think I will be taking any action on this... The user should not be using these all-caps-with-spaces-in-between writing style. I'll tell them that, if I get any complaints. Safe to assume that 'specialist', written in normal English won't hit, right? Samy