On 30.01.24 09:59, joe a wrote:
Advisable to "prune" Bayes data based on age?
While cleaning up recent Ham/Spam, found my "saved SPAM" goes back
to 2013.
Why that's over . . . wait, I need to take off my socks . . .
So, how old is "too old". For saved SPAM?
On 1/30/2024 10:58:52, Matus UHLAR - fantomas wrote:
I did retrain on old spam a few times and it was working fine.
Depends on how much mail you have:
0.000 0 7542 0 non-token data: nspam
0.000 0 80869 0 non-token data: nham
0.000 0 996032 0 non-token data: ntokens
0.000 0 1172945918 0 non-token data: oldest atime
so, even old spam mey be fine. You however need much of ham to train
otherwise everything starts looking like spam.
On 30.01.24 11:12, joe a wrote:
Recently missed spam has increased a bit, so I was dropping it into
"missed spam" and went poking through marked spam and found lots of
"missed ham". Which triggered my pondering.
training on false-positives/false-negatives is important to have it up to
date.
full retraining only makes sense if you lose your DB, it gets corrupt or
starts misclassifying too often (may the reason be known or not).
autolearn may help if your DB is well maintained, although I have disabled
nearly all rules with negative scores, like
RCVD_IN_DNSWL_*
RCVD_IN_IADB_*
DKIMWL_WL_*
RCVD_IN_MSPIKE_*
RCVD_IN_VALIDITY_*
USER_IN_DEF_*
ALL_TRUSTED
etc, because spammers often abuse these.
I mean, they may have negative score but don't train on them.
--
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
M$ Win's are shit, do not use it !