Hi, I've just retrained my bayes database (stored in SQL) with 10k hams and about 6k spams. I tried to make sure there were no newsletters in either corpus, but some emails present as newsletters but really are spam. However, many legitimate newsletters are hitting BAYES_99 even though I haven't trained them.
I can imagine the newsletter template is somewhat common, but does bayes have any ability to distinguish a junk newsletter from a legitimate newsletter? I realize there's somewhat of an imbalance between hams and spams, but shouldn't there be enough? Would I benefit from training known trustworthy newsletters such as ham? Do you have any recommendations outside of adding the legitimate newsletters hitting BAYES_99 to an allowlist? My previous bayes database had 100k emails of each type, created over like ten years, but also had problems with identifying newsletters properly. Thanks, Alex