Re: Bayes in V4 compared to V3

Grega via users Fri, 13 Sep 2024 02:01:42 -0700

Do you have V3 or V4 SA?


________________________________
From: Reindl Harald (privat) <ha...@rhsoft.net>
Sent: Friday, 13 September 2024 10:57
To: Grega; Bill Cole; Grega via users
Subject: Re: Bayes in V4 compared to V3

autolearn was always a blackbox

that below are the stats for the current month and that bayes is built
from 2014 until now and i rebuild it from scratch every month

the corpus of 178.138 messages is stored as single eml-files

a few errors with autolearn over the years can amplify and render your
bayes usesless over time with no way to do anything because you don't
have the corpus and don't know what was trained how

[root@mail-gw:~]$ bayes-stats.sh
0     135700    SPAM
0      42438    HAM
0    5116765    TOKEN

total 514M
  24K -rw-r----- 1 sa-milt sa-milt  24K 2024-09-12 14:11 bayes_seen
129M -rw-r----- 1 sa-milt sa-milt 160M 2024-09-12 14:11 bayes_toks
386M -rw-r----- 1 sa-milt sa-milt 386M 2024-09-12 14:10 wordlist.db

BAYES_00         4455   45.10 %
BAYES_05          363    3.67 %
BAYES_20          471    4.76 %
BAYES_40          440    4.45 %
BAYES_50         2106   21.32 %
BAYES_60          119    1.20 %     5.87 % (OF TOTAL BLOCKED)
BAYES_80          108    1.09 %     5.33 % (OF TOTAL BLOCKED)
BAYES_95           81    0.82 %     4.00 % (OF TOTAL BLOCKED)
BAYES_99         1735   17.56 %    85.72 % (OF TOTAL BLOCKED)
BAYES_999        1572   15.91 %    77.66 % (OF TOTAL BLOCKED)

DELIVERED       13865   88.15 %
DNSWL           14376   91.40 %
SPF             15203   96.66 %
SPF/DKIM WL      5705   36.27 %
SHORTCIRCUIT     5894   37.47 %

BLOCKED          2024   12.86 %
SPAMMY           2043   12.98 %   100.93 % (OF TOTAL BLOCKED)

Am 13.09.24 um 10:51 schrieb Grega:
> This strategy worked really great in V3 and bayes was excellent even
> with autotrain and ocasionally manual training.
>
>
> Now it`s non decisive and useless at least for me.
>
> We have around 5k-7k daily mails...
>
>
>
> ------------------------------------------------------------------------
> *From:* Reindl Harald (privat) <ha...@rhsoft.net>
> *Sent:* Friday, 13 September 2024 10:22
> *To:* Grega; Bill Cole; Grega via users
> *Subject:* Re: Bayes in V4 compared to V3
>
>
> Am 13.09.24 um 06:53 schrieb Grega via users:
>> And I`m reconfiguring autolearn to -4 for HAM and 12 for SPAM to really
>> auto-train on correct mails...
>
> this is even more nonsense than autolearn itself
>
> what you really want to train are wrong classified messages and decision
> can only be made by an human
>
> if you train wrong classified mails in both directions you amplify the
> incorrect result
>
> it happens that HAM MAILS have a score above 12 from time to time
> because of blacklists and over-aggressive rules and when you then
> atolearn the content as spam your bayes will result in what it is now

Re: Bayes in V4 compared to V3

Reply via email to