Please note that "Reindl Harald" is excluded from posting to the
SpamAssassin Users mailing list as a consequence of past behavior. It is
my understanding that they still follow the list via some public archive
and reply off-list whenever they have an opportunity to be rude towards
people with SpamAssassin difficulties.
Whether or not their advice is worth considering is obviously a personal
judgment, but you should be aware that you are speaking with someone who
has in the past worked to disrupt this list (and others.)
Please send any replies to the list only.
On 2024-09-13 at 05:00:17 UTC-0400 (Fri, 13 Sep 2024 09:00:17 +0000)
Grega <gr...@nabiralnik.eu>
is rumored to have said:
Do you have V3 or V4 SA?
________________________________
From: Reindl Harald (privat) <ha...@rhsoft.net>
Sent: Friday, 13 September 2024 10:57
To: Grega; Bill Cole; Grega via users
Subject: Re: Bayes in V4 compared to V3
autolearn was always a blackbox
that below are the stats for the current month and that bayes is built
from 2014 until now and i rebuild it from scratch every month
the corpus of 178.138 messages is stored as single eml-files
a few errors with autolearn over the years can amplify and render your
bayes usesless over time with no way to do anything because you don't
have the corpus and don't know what was trained how
[root@mail-gw:~]$ bayes-stats.sh
0 135700 SPAM
0 42438 HAM
0 5116765 TOKEN
total 514M
24K -rw-r----- 1 sa-milt sa-milt 24K 2024-09-12 14:11 bayes_seen
129M -rw-r----- 1 sa-milt sa-milt 160M 2024-09-12 14:11 bayes_toks
386M -rw-r----- 1 sa-milt sa-milt 386M 2024-09-12 14:10 wordlist.db
BAYES_00 4455 45.10 %
BAYES_05 363 3.67 %
BAYES_20 471 4.76 %
BAYES_40 440 4.45 %
BAYES_50 2106 21.32 %
BAYES_60 119 1.20 % 5.87 % (OF TOTAL BLOCKED)
BAYES_80 108 1.09 % 5.33 % (OF TOTAL BLOCKED)
BAYES_95 81 0.82 % 4.00 % (OF TOTAL BLOCKED)
BAYES_99 1735 17.56 % 85.72 % (OF TOTAL BLOCKED)
BAYES_999 1572 15.91 % 77.66 % (OF TOTAL BLOCKED)
DELIVERED 13865 88.15 %
DNSWL 14376 91.40 %
SPF 15203 96.66 %
SPF/DKIM WL 5705 36.27 %
SHORTCIRCUIT 5894 37.47 %
BLOCKED 2024 12.86 %
SPAMMY 2043 12.98 % 100.93 % (OF TOTAL BLOCKED)
Am 13.09.24 um 10:51 schrieb Grega:
This strategy worked really great in V3 and bayes was excellent even
with autotrain and ocasionally manual training.
Now it`s non decisive and useless at least for me.
We have around 5k-7k daily mails...
------------------------------------------------------------------------
*From:* Reindl Harald (privat) <ha...@rhsoft.net>
*Sent:* Friday, 13 September 2024 10:22
*To:* Grega; Bill Cole; Grega via users
*Subject:* Re: Bayes in V4 compared to V3
Am 13.09.24 um 06:53 schrieb Grega via users:
And I`m reconfiguring autolearn to -4 for HAM and 12 for SPAM to
really
auto-train on correct mails...
this is even more nonsense than autolearn itself
what you really want to train are wrong classified messages and
decision
can only be made by an human
if you train wrong classified mails in both directions you amplify
the
incorrect result
it happens that HAM MAILS have a score above 12 from time to time
because of blacklists and over-aggressive rules and when you then
atolearn the content as spam your bayes will result in what it is now
--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo@toad.social and many *@billmail.scconsult.com
addresses)
Not Currently Available For Hire