On 29 Aug 2019, at 17:04, John Hardin <jhar...@impsec.org> wrote:
> 
> On Thu, 29 Aug 2019, Samy Ascha wrote:
> 
>> On 28 Aug 2019, at 16:48, John Hardin <jhar...@impsec.org> wrote:
>>> 
>>> On Wed, 28 Aug 2019, Samy Ascha wrote:
>>> 
>>>> Today, I encountered, for the first time, an issue with scanning an email 
>>>> that is composed in Spanish.
>>>> 
>>>> It is hitting a fuzzy match somewhere in the DRUGS_ERECTILE and 
>>>> DRUGS_ERECTILE_OBFU rules matches.
>>>> 
>>>> I'm generally looking for a way to manipulate these edge cases, where 
>>>> languages are likely to match rules assuming English for the body text.
>>>> 
>>>> Is there any best-practice for this? I'm sure this happens in others' 
>>>> networks, but I'm totally unsure on how to best resolve this.
>>>> 
>>>> Anything in the way of configuration to combat this, e.g. by combining 
>>>> language detection with other tags?
>>>> 
>>>> Or, should I look into writing my own plugin to do something similar?
>>> 
>>> Generally the approach is to add an exclusion for the specific valid 
>>> non-english word to the rule itself.
>>> 
>>> Is it possible for the FP message to be provided for analysis? (Post to 
>>> pastebin or similar and post that URL here.)
>>> 
>>> As this is a body rule, feel free to mangle the headers as needed for 
>>> privacy, apart possibly from the Subject...
>> 
>> Thank you. That is a good suggestion. The message body is available here:
>> 
>> https://pastebin.com/S73gcDVj <https://pastebin.com/S73gcDVj>
>> 
>> I realise this message hits a bunch of other rules, but the question remains 
>> the same ;)
>> 
>> On a side note. I've not really been searching for it yet, but is there a 
>> preferred way to do a one-shot scan + analyse of a message with 
>> Spamassassin? Something any of you would use to analyse the message in this 
>> case, for example?
> 
> Run SpamAssassin in debug mode with various flags set to capture rule hits 
> and other useful information.
> 
> Here's what I use in a script running against my SA dev environment:
> 
> SRC=${1:-spam.msg}
> export WD=`pwd`
> unset LESSOPEN
> ( cd ~/develop/spamassassin/svn/trunk ; time ./spamassassin -L -t 
> --siteconfigpath $WD --debug area=all,rules,rules-all,message,uri ) < "$SRC" 
> 2>&1 | grep -av " merged duplicates: " >result && less result
> 
> 
> -- 
> John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
> jhar...@impsec.org    FALaholic #11174     pgpk -a jhar...@impsec.org
> key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
> -----------------------------------------------------------------------
> There is no doubt in my mind that millions of lives could have been
> saved if the people were not "brainwashed" about gun ownership and
> had been well armed. ... Gun haters always want to forget the Warsaw
> Ghetto uprising, which is a perfect example of how a ragtag,
> half-starved group of Jews took 10 handguns and made asses out of
> the Nazis.                        -- Theodore Haas, Dachau survivor
> -----------------------------------------------------------------------
> 882 days since the first commercial re-flight of an orbital booster (SpaceX)

I sent a mail just now, includiing the log and matching line, but I guess it 
hits filters for the mailing list :)

If not, excuse me for sending this message again, too early.

---

Thx a lot for that extra info. That was enough to find the match.

Face-palm incoming: this match is found in line 117 in the pasted message.

Very obvious, now that it's found.

I don't think I will be taking any action on this... The user should not be 
using these all-caps-with-spaces-in-between writing style. I'll tell them that, 
if I get any complaints.

Safe to assume that 'specialist', written in normal English won't hit, right?

Samy



Reply via email to