On Thu, Feb 16, 2023 at 10:18:50AM +0100, hg user wrote:
> I was investigating a bunch of bitcoin spam: different titles,
> different senders (all from gmail), different text, different pdf
> attachment.
> 
> Unfortunately in those days my bayes db was polluted and they all got
> a BAYES_50, 0.8.
> 
> I tested the messages now with a recreated bayes db and got some
> BAYES_999. So I dug to understand if I already saw the spam...
> 
> But the debug result was unpleasant:
> dbg: bayes: tokenized header: 92 tokens
> dbg: bayes: token 'HX-Received:Jan' => 0.998028449502134
> dbg: bayes: token 'HX-Google-DKIM-Signature:20210112' => 0.997244532803181
> dbg: bayes: token 'H*r:sk:<START_OF_RECIPIENT_EMAIL_ADDRESS>' =>
> 0.997244532803181
> dbg: bayes: token 'H*r:a05' => 0.995425742574258
> dbg: bayes: token 'HAuthentication-Results:sk:<MY_SA_HOSTNAME>.' =>
> 0.986543689320388
> dbg: bayes: token 'HX-Google-DKIM-Signature:reply-to' => 0.916110175863517
> dbg: bayes: token 'H*r:2002' => 0.877842810325844
> dbg: bayes: token 'HAuthentication-Results:2048-bit' => 0.858520043212023
> dbg: bayes: token 'HAuthentication-Results:pass' => 0.855193895034317
> dbg: bayes: score = 0.999997915091326
> 
> 
> Every score is based on headers, very generic headers. and some
> related to my setup.
> 
> Not a single token from the message body....

The Bayes implementation has been practically unmaintained for a long time,
so YMMV.

You can try something like this, most headers are parsed badly and generate
biasing random garbage (unscientific observation):

bayes_ignore_header ARC-Authentication-Results
bayes_ignore_header ARC-Message-Signature
bayes_ignore_header ARC-Seal
bayes_ignore_header Authentication-Results
bayes_ignore_header Autocrypt
bayes_ignore_header IronPort-SDR
bayes_ignore_header suggested_attachment_session_id
bayes_ignore_header X-Brightmail-Tracker
bayes_ignore_header X-Exchange-Antispam-Report-CFA-Test
bayes_ignore_header X-Forefront-Antispam-Report
bayes_ignore_header X-Forefront-Antispam-Report-Untrusted
bayes_ignore_header X-Gm-Message-State
bayes_ignore_header X-Google-DKIM-Signature
bayes_ignore_header x-microsoft-antispam
bayes_ignore_header X-Microsoft-Antispam-Message-Info
bayes_ignore_header X-Microsoft-Antispam-Message-Info-Original
bayes_ignore_header X-Microsoft-Antispam-Untrusted
bayes_ignore_header X-Microsoft-Exchange-Diagnostics
bayes_ignore_header x-ms-exchange-antispam-messagedata
bayes_ignore_header x-ms-exchange-antispam-messagedata-0
bayes_ignore_header x-ms-exchange-crosstenant-id
bayes_ignore_header x-ms-exchange-crosstenant-network-message-id
bayes_ignore_header x-ms-exchange-crosstenant-rms-persistedconsumerorg
bayes_ignore_header X-MS-Exchange-CrossTenant-userprincipalname
bayes_ignore_header x-ms-exchange-slblob-mailprops
bayes_ignore_header x-ms-office365-filtering-correlation-id
bayes_ignore_header X-MSFBL
bayes_ignore_header X-Provags-ID
bayes_ignore_header X-SG-EID
bayes_ignore_header X-SG-ID
bayes_ignore_header X-UI-Out-Filterresults
bayes_ignore_header X-ClientProxiedBy
bayes_ignore_header X-MS-Exchange-CrossTenant-FromEntityHeader
bayes_ignore_header X-OriginatorOrg
bayes_ignore_header X-MS-Exchange-CrossTenant-OriginalArrivalTime
bayes_ignore_header X-MS-TrafficTypeDiagnostic
bayes_ignore_header X-MS-Exchange-CrossTenant-AuthAs
bayes_ignore_header X-MS-Exchange-Transport-CrossTenantHeadersStamped
bayes_ignore_header X-MS-Exchange-CrossTenant-AuthSource

Reply via email to