Re: Bayes in V4 compared to V3

Bill Cole Fri, 13 Sep 2024 06:05:25 -0700

Please note that "Reindl Harald" is excluded from posting to theSpamAssassin Users mailing list as a consequence of past behavior. It ismy understanding that they still follow the list via some public archiveand reply off-list whenever they have an opportunity to be rude towardspeople with SpamAssassin difficulties.

Whether or not their advice is worth considering is obviously a personaljudgment, but you should be aware that you are speaking with someone whohas in the past worked to disrupt this list (and others.)


Please send any replies to the list only.

On 2024-09-13 at 05:00:17 UTC-0400 (Fri, 13 Sep 2024 09:00:17 +0000)
Grega <gr...@nabiralnik.eu>
is rumored to have said:

Do you have V3 or V4 SA?


________________________________
From: Reindl Harald (privat) <ha...@rhsoft.net>
Sent: Friday, 13 September 2024 10:57
To: Grega; Bill Cole; Grega via users
Subject: Re: Bayes in V4 compared to V3

autolearn was always a blackbox

that below are the stats for the current month and that bayes is built
from 2014 until now and i rebuild it from scratch every month

the corpus of 178.138 messages is stored as single eml-files

a few errors with autolearn over the years can amplify and render your
bayes usesless over time with no way to do anything because you don't
have the corpus and don't know what was trained how

[root@mail-gw:~]$ bayes-stats.sh
0     135700    SPAM
0      42438    HAM
0    5116765    TOKEN

total 514M
  24K -rw-r----- 1 sa-milt sa-milt  24K 2024-09-12 14:11 bayes_seen
129M -rw-r----- 1 sa-milt sa-milt 160M 2024-09-12 14:11 bayes_toks
386M -rw-r----- 1 sa-milt sa-milt 386M 2024-09-12 14:10 wordlist.db

BAYES_00         4455   45.10 %
BAYES_05          363    3.67 %
BAYES_20          471    4.76 %
BAYES_40          440    4.45 %
BAYES_50         2106   21.32 %
BAYES_60          119    1.20 %     5.87 % (OF TOTAL BLOCKED)
BAYES_80          108    1.09 %     5.33 % (OF TOTAL BLOCKED)
BAYES_95           81    0.82 %     4.00 % (OF TOTAL BLOCKED)
BAYES_99         1735   17.56 %    85.72 % (OF TOTAL BLOCKED)
BAYES_999        1572   15.91 %    77.66 % (OF TOTAL BLOCKED)

DELIVERED       13865   88.15 %
DNSWL           14376   91.40 %
SPF             15203   96.66 %
SPF/DKIM WL      5705   36.27 %
SHORTCIRCUIT     5894   37.47 %

BLOCKED          2024   12.86 %
SPAMMY           2043   12.98 %   100.93 % (OF TOTAL BLOCKED)

Am 13.09.24 um 10:51 schrieb Grega:

This strategy worked really great in V3 and bayes was excellent even
with autotrain and ocasionally manual training.


Now it`s non decisive and useless at least for me.

We have around 5k-7k daily mails...



------------------------------------------------------------------------
*From:* Reindl Harald (privat) <ha...@rhsoft.net>
*Sent:* Friday, 13 September 2024 10:22
*To:* Grega; Bill Cole; Grega via users
*Subject:* Re: Bayes in V4 compared to V3


Am 13.09.24 um 06:53 schrieb Grega via users:

And I`m reconfiguring autolearn to -4 for HAM and 12 for SPAM toreally
auto-train on correct mails...


this is even more nonsense than autolearn itself

what you really want to train are wrong classified messages anddecision

can only be made by an human

if you train wrong classified mails in both directions you amplifythe

incorrect result

it happens that HAM MAILS have a score above 12 from time to time
because of blacklists and over-aggressive rules and when you then
atolearn the content as spam your bayes will result in what it is now



--
Bill Cole
b...@scconsult.com or billc...@apache.org

(AKA @grumpybozo@toad.social and many *@billmail.scconsult.comaddresses)

Not Currently Available For Hire

Re: Bayes in V4 compared to V3

Reply via email to