Re: Bayes autolearn: how does it resolve whether rules are body or header related?

Matus UHLAR - fantomas Sun, 09 May 2021 11:03:49 -0700

On 09.05.21 04:17, Bert Van de Poel wrote:

Dear fellow Spamassassin users,
I recently noticed that quite a lot of spam emails with high scoresweren't marked for Bayes autolearning. While some senders andreceivers were a common match, explaining why autolearn was nog, therewas no clear explanation for other cases. I therefore put Spamassassinin debug mode to check in more detail, and noticed that fairly oftenautolearn is not used because the minimum score for body tests isn'tachieved. After looking at some specific cases, it seems however thatseveral rules are either not considered when calculating the headerrule score and body rule score for Bayes autolearning. I've alwayspresumed these scores are calculated based on whether the underlyingrule performs a regex on a header or on the body, but now I'm not sosure any more. I hope you can help clear up whether this is intendedbehaviour (and what that behaviour is) or whether I should report thisas a bug.
One example I noticed is URI_DEOBFU_INSTR=3.595. This is if Iunderstand it correctly a URI test that's performed on the body.Should a test like this be counted towards the body score count? Thenthere's the question of meta rules such as MONEY_NOHTML. If youresolve the different meta levels within this rule, it's a combinationof header and body, however it's only counted towards the headerscore. Finally, it seems as if custom rules I've added within local.cfaren't considered. Is that indeed the case (and if so, is that bydesign)? I'm also not completely sure if UNWANTED_BODY_LANGUAGE andtests like razor, pyzor and DCC are considered for body scores.
Within the same realm, I'm also wondering whether these expectednumbers for body and header can be tweaked and if so, how. For examplethe case below isn't autolearned even though it has a huge score and avast amount of tests going off, but seemingly not enough body-relatedscores. Is that really the intended behaviour?
May 8 10:40:32 mail amavis[4076058]: (4076058-16)header_edits_for_quar: <fine...@dasanart.com> -><g...@notgoingtoshare.tld>, Yes, score=24.619 tag=-9999 tag2=5kill=7.5 tests=[ADVANCE_FEE_3_NEW_MONEY=0.001,AXB_XMAILER_MIMEOLE_OL_024C2=0.001, BAYES_50=0.8, BERT_KULSPAM=1,FORGED_MUA_OUTLOOK=1.927, FREEMAIL_FORGED_REPLYTO=2.095,FREEMAIL_REPLYTO=1, FREEMAIL_REPLYTO_END_DIGIT=0.25,FROM_MISSPACED=0.001, FROM_MISSP_EH_MATCH=0.001,FROM_MISSP_FREEMAIL=0.001, FROM_MISSP_MSFT=0.001,FROM_MISSP_REPLYTO=2.497, FSL_BULK_SIG=0.001, FSL_CTYPE_WIN1251=0.001,FSL_NEW_HELO_USER=0.001, KHOP_HELO_FCRDNS=0.398, LOTS_OF_MONEY=0.001,MISSING_HEADERS=1.021, MISSING_MID=0.497, MONEY_FREEMAIL_REPTO=1.202,MONEY_FROM_MISSP=0.001, MONEY_NOHTML=2.497, NSL_RCVD_HELO_USER=0.001,PYZOR_CHECK=1.392, REPLYTO_WITHOUT_TO_CC=1.552, REPTO_419_FRAUD=2.996,SPF_HELO_NONE=0.001, TO_NO_BRKTS_FROM_MSSP=1.593,TO_NO_BRKTS_MSFT=1.888, XFER_LOTSA_MONEY=0.001] autolearn=noautolearn_force=no
Thank you in advance for your help. If you need any more examples orwould us to run some tests, then feel free to let me know.


looks like most of those are meta rules:

header FREEMAIL_REPLYTO_END_DIGIT
header MISSING_HEADERS
body BAYES_50
header SPF_HELO_NONE
header FSL_CTYPE_WIN1251
header NSL_RCVD_HELO_USER
header REPTO_419_FRAUD

score FREEMAIL_REPLYTO_END_DIGIT 0.25
score MISSING_HEADERS 0.915 1.207 1.204 1.021
score SPF_HELO_NONE 0.001

so you don't have points from body rules.

your mentioned URI_DEOBFU_INSTR is a meta rule:

meta URI_DEOBFU_INSTR __URI_DEOBFU_INSTR && !__MSGID_OK_HOST

so maybe it's not considered.


--
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Linux IS user friendly, it's just selective who its friends are...

Re: Bayes autolearn: how does it resolve whether rules are body or header related?

Reply via email to