On Thu, Feb 16, 2023 at 10:18:50AM +0100, hg user wrote: > I was investigating a bunch of bitcoin spam: different titles, > different senders (all from gmail), different text, different pdf > attachment. > > Unfortunately in those days my bayes db was polluted and they all got > a BAYES_50, 0.8. > > I tested the messages now with a recreated bayes db and got some > BAYES_999. So I dug to understand if I already saw the spam... > > But the debug result was unpleasant: > dbg: bayes: tokenized header: 92 tokens > dbg: bayes: token 'HX-Received:Jan' => 0.998028449502134 > dbg: bayes: token 'HX-Google-DKIM-Signature:20210112' => 0.997244532803181 > dbg: bayes: token 'H*r:sk:<START_OF_RECIPIENT_EMAIL_ADDRESS>' => > 0.997244532803181 > dbg: bayes: token 'H*r:a05' => 0.995425742574258 > dbg: bayes: token 'HAuthentication-Results:sk:<MY_SA_HOSTNAME>.' => > 0.986543689320388 > dbg: bayes: token 'HX-Google-DKIM-Signature:reply-to' => 0.916110175863517 > dbg: bayes: token 'H*r:2002' => 0.877842810325844 > dbg: bayes: token 'HAuthentication-Results:2048-bit' => 0.858520043212023 > dbg: bayes: token 'HAuthentication-Results:pass' => 0.855193895034317 > dbg: bayes: score = 0.999997915091326 > > > Every score is based on headers, very generic headers. and some > related to my setup. > > Not a single token from the message body....
The Bayes implementation has been practically unmaintained for a long time, so YMMV. You can try something like this, most headers are parsed badly and generate biasing random garbage (unscientific observation): bayes_ignore_header ARC-Authentication-Results bayes_ignore_header ARC-Message-Signature bayes_ignore_header ARC-Seal bayes_ignore_header Authentication-Results bayes_ignore_header Autocrypt bayes_ignore_header IronPort-SDR bayes_ignore_header suggested_attachment_session_id bayes_ignore_header X-Brightmail-Tracker bayes_ignore_header X-Exchange-Antispam-Report-CFA-Test bayes_ignore_header X-Forefront-Antispam-Report bayes_ignore_header X-Forefront-Antispam-Report-Untrusted bayes_ignore_header X-Gm-Message-State bayes_ignore_header X-Google-DKIM-Signature bayes_ignore_header x-microsoft-antispam bayes_ignore_header X-Microsoft-Antispam-Message-Info bayes_ignore_header X-Microsoft-Antispam-Message-Info-Original bayes_ignore_header X-Microsoft-Antispam-Untrusted bayes_ignore_header X-Microsoft-Exchange-Diagnostics bayes_ignore_header x-ms-exchange-antispam-messagedata bayes_ignore_header x-ms-exchange-antispam-messagedata-0 bayes_ignore_header x-ms-exchange-crosstenant-id bayes_ignore_header x-ms-exchange-crosstenant-network-message-id bayes_ignore_header x-ms-exchange-crosstenant-rms-persistedconsumerorg bayes_ignore_header X-MS-Exchange-CrossTenant-userprincipalname bayes_ignore_header x-ms-exchange-slblob-mailprops bayes_ignore_header x-ms-office365-filtering-correlation-id bayes_ignore_header X-MSFBL bayes_ignore_header X-Provags-ID bayes_ignore_header X-SG-EID bayes_ignore_header X-SG-ID bayes_ignore_header X-UI-Out-Filterresults bayes_ignore_header X-ClientProxiedBy bayes_ignore_header X-MS-Exchange-CrossTenant-FromEntityHeader bayes_ignore_header X-OriginatorOrg bayes_ignore_header X-MS-Exchange-CrossTenant-OriginalArrivalTime bayes_ignore_header X-MS-TrafficTypeDiagnostic bayes_ignore_header X-MS-Exchange-CrossTenant-AuthAs bayes_ignore_header X-MS-Exchange-Transport-CrossTenantHeadersStamped bayes_ignore_header X-MS-Exchange-CrossTenant-AuthSource