Do you have V3 or V4 SA?
________________________________ From: Reindl Harald (privat) <ha...@rhsoft.net> Sent: Friday, 13 September 2024 10:57 To: Grega; Bill Cole; Grega via users Subject: Re: Bayes in V4 compared to V3 autolearn was always a blackbox that below are the stats for the current month and that bayes is built from 2014 until now and i rebuild it from scratch every month the corpus of 178.138 messages is stored as single eml-files a few errors with autolearn over the years can amplify and render your bayes usesless over time with no way to do anything because you don't have the corpus and don't know what was trained how [root@mail-gw:~]$ bayes-stats.sh 0 135700 SPAM 0 42438 HAM 0 5116765 TOKEN total 514M 24K -rw-r----- 1 sa-milt sa-milt 24K 2024-09-12 14:11 bayes_seen 129M -rw-r----- 1 sa-milt sa-milt 160M 2024-09-12 14:11 bayes_toks 386M -rw-r----- 1 sa-milt sa-milt 386M 2024-09-12 14:10 wordlist.db BAYES_00 4455 45.10 % BAYES_05 363 3.67 % BAYES_20 471 4.76 % BAYES_40 440 4.45 % BAYES_50 2106 21.32 % BAYES_60 119 1.20 % 5.87 % (OF TOTAL BLOCKED) BAYES_80 108 1.09 % 5.33 % (OF TOTAL BLOCKED) BAYES_95 81 0.82 % 4.00 % (OF TOTAL BLOCKED) BAYES_99 1735 17.56 % 85.72 % (OF TOTAL BLOCKED) BAYES_999 1572 15.91 % 77.66 % (OF TOTAL BLOCKED) DELIVERED 13865 88.15 % DNSWL 14376 91.40 % SPF 15203 96.66 % SPF/DKIM WL 5705 36.27 % SHORTCIRCUIT 5894 37.47 % BLOCKED 2024 12.86 % SPAMMY 2043 12.98 % 100.93 % (OF TOTAL BLOCKED) Am 13.09.24 um 10:51 schrieb Grega: > This strategy worked really great in V3 and bayes was excellent even > with autotrain and ocasionally manual training. > > > Now it`s non decisive and useless at least for me. > > We have around 5k-7k daily mails... > > > > ------------------------------------------------------------------------ > *From:* Reindl Harald (privat) <ha...@rhsoft.net> > *Sent:* Friday, 13 September 2024 10:22 > *To:* Grega; Bill Cole; Grega via users > *Subject:* Re: Bayes in V4 compared to V3 > > > Am 13.09.24 um 06:53 schrieb Grega via users: >> And I`m reconfiguring autolearn to -4 for HAM and 12 for SPAM to really >> auto-train on correct mails... > > this is even more nonsense than autolearn itself > > what you really want to train are wrong classified messages and decision > can only be made by an human > > if you train wrong classified mails in both directions you amplify the > incorrect result > > it happens that HAM MAILS have a score above 12 from time to time > because of blacklists and over-aggressive rules and when you then > atolearn the content as spam your bayes will result in what it is now