Hi,

Here's also another 50+ headers we've collected over the years that I believe started as a list from AXB 10+ years ago.

https://pastebin.com/raw/f6Fwh8HJ

dave

On 2/16/23 6:02 AM, Henrik K wrote:
On Thu, Feb 16, 2023 at 10:18:50AM +0100, hg user wrote:
I was investigating a bunch of bitcoin spam: different titles,
different senders (all from gmail), different text, different pdf
attachment.

Unfortunately in those days my bayes db was polluted and they all got
a BAYES_50, 0.8.

I tested the messages now with a recreated bayes db and got some
BAYES_999. So I dug to understand if I already saw the spam...

But the debug result was unpleasant:
dbg: bayes: tokenized header: 92 tokens
dbg: bayes: token 'HX-Received:Jan' => 0.998028449502134
dbg: bayes: token 'HX-Google-DKIM-Signature:20210112' => 0.997244532803181
dbg: bayes: token 'H*r:sk:<START_OF_RECIPIENT_EMAIL_ADDRESS>' =>
0.997244532803181
dbg: bayes: token 'H*r:a05' => 0.995425742574258
dbg: bayes: token 'HAuthentication-Results:sk:<MY_SA_HOSTNAME>.' =>
0.986543689320388
dbg: bayes: token 'HX-Google-DKIM-Signature:reply-to' => 0.916110175863517
dbg: bayes: token 'H*r:2002' => 0.877842810325844
dbg: bayes: token 'HAuthentication-Results:2048-bit' => 0.858520043212023
dbg: bayes: token 'HAuthentication-Results:pass' => 0.855193895034317
dbg: bayes: score = 0.999997915091326


Every score is based on headers, very generic headers. and some
related to my setup.

Not a single token from the message body....
The Bayes implementation has been practically unmaintained for a long time,
so YMMV.

You can try something like this, most headers are parsed badly and generate
biasing random garbage (unscientific observation):

bayes_ignore_header ARC-Authentication-Results
bayes_ignore_header ARC-Message-Signature
bayes_ignore_header ARC-Seal
bayes_ignore_header Authentication-Results
bayes_ignore_header Autocrypt
bayes_ignore_header IronPort-SDR
bayes_ignore_header suggested_attachment_session_id
bayes_ignore_header X-Brightmail-Tracker
bayes_ignore_header X-Exchange-Antispam-Report-CFA-Test
bayes_ignore_header X-Forefront-Antispam-Report
bayes_ignore_header X-Forefront-Antispam-Report-Untrusted
bayes_ignore_header X-Gm-Message-State
bayes_ignore_header X-Google-DKIM-Signature
bayes_ignore_header x-microsoft-antispam
bayes_ignore_header X-Microsoft-Antispam-Message-Info
bayes_ignore_header X-Microsoft-Antispam-Message-Info-Original
bayes_ignore_header X-Microsoft-Antispam-Untrusted
bayes_ignore_header X-Microsoft-Exchange-Diagnostics
bayes_ignore_header x-ms-exchange-antispam-messagedata
bayes_ignore_header x-ms-exchange-antispam-messagedata-0
bayes_ignore_header x-ms-exchange-crosstenant-id
bayes_ignore_header x-ms-exchange-crosstenant-network-message-id
bayes_ignore_header x-ms-exchange-crosstenant-rms-persistedconsumerorg
bayes_ignore_header X-MS-Exchange-CrossTenant-userprincipalname
bayes_ignore_header x-ms-exchange-slblob-mailprops
bayes_ignore_header x-ms-office365-filtering-correlation-id
bayes_ignore_header X-MSFBL
bayes_ignore_header X-Provags-ID
bayes_ignore_header X-SG-EID
bayes_ignore_header X-SG-ID
bayes_ignore_header X-UI-Out-Filterresults
bayes_ignore_header X-ClientProxiedBy
bayes_ignore_header X-MS-Exchange-CrossTenant-FromEntityHeader
bayes_ignore_header X-OriginatorOrg
bayes_ignore_header X-MS-Exchange-CrossTenant-OriginalArrivalTime
bayes_ignore_header X-MS-TrafficTypeDiagnostic
bayes_ignore_header X-MS-Exchange-CrossTenant-AuthAs
bayes_ignore_header X-MS-Exchange-Transport-CrossTenantHeadersStamped
bayes_ignore_header X-MS-Exchange-CrossTenant-AuthSource
--


     DaveWreski

President & CEO

Guardian Digital, Inc.

We Make Email Safe

        

        
        

        

        640-800-9446 <tel:640-800-9446>

        dwre...@guardiandigital.com <mailto:dwre...@guardiandigital.com>

        https://guardiandigital.com <https://guardiandigital.com>

        103 Godwin Ave, Suite 314, Midland Park, NJ 07432



        
facebook <https://www.facebook.com/gdlinux>       

        twitter <https://twitter.com/gdlinux>     

        linkedin <https://www.linkedin.com/company/guardiandigital>       

Reply via email to