On 15.02.23 14:53, hg user wrote:
If you run spamassasin with -D bayes -t xxx 2>debug.log
in debug.log you will see all the "tokens" the bayes system extracts
from the headers and you will probably find a lot of them related to
mailing lists.
If you teach SA that those tokens are spam and they are present both
in WP or Forbes, their emails will be flagged. It's normal.
Don't expect anyone to manually compare tokens, unless they are deeply
debugging bayes functionality.
Simply said, bayes DOES gather all possible tokens and compare their
occurence with interesting effectivity - if you train Forbes and WP
newsletters as ham, and other newsletters as spam, bayes should be able to
distinguish them quite nicely.
However, many of tokens in even Forbes and WP newsletters may occure in
different spamy newsletters, so be careful when traning even these.
If you get the score down enough not to be classified as spam, you've won
and should not contine (unless you are willing to check all BAYES_0 mail for
suspicious newsletters and train those as spam, seeing how much it affects
mentioned Forbes and WP newsletters.
Bayes training is great, but one should be careful about that.
If you want you can use bayes_ignore_header to ignore some headers.
this rarely helps.
On 2/15/23, Matus UHLAR - fantomas <uh...@fantomas.sk> wrote:
*-1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1%
>* [score: 0.0000]
This indicates a mistrained database, which means you have trained too
many
spams or spam-like messages (commercial messages) as ham.
Proper training of spams should help. Just keep your spam (and
optionally
ham) corpora for retraining in case you would drop the database.
I also recommend to abstain from training commercial mail (notices from
e-shops, companies you done business with etc) as ham, unless they
generate
BAYES_999 score and you want it lower. I often train them as spam so
those
give uncertain BAYES_50 result.
On 14.02.23 23:05, Alex wrote:
Is there any ability to distinguish a legitimate newsletter from a spam
newsletter?
Very hard.
That's why I recommend not to train newsletters unless you know you/users
want them and they produce BAYES_99 result.
In other words, if I train emails from Forbes or Washington Post as ham,
then train similar newsletter emails from other other providers that are
more suspect, will bayes still be able to distinguish Forbes and WP as
ham?
The problem is that if I avoid training newsletters or bulk email
altogether, then I'm also left with spam newsletters still only hitting
bayes50.
If you only do this for Forbes or Washington Post, bayes will likely be able
to distinguish other newsletters, if you train those as spam.
I'm actually in a situation now where Forbes and WP newsletters are being
marked as spam, so considering retraining, but wondering what
approach/best
practices I should be following.
This should be safe. There are many types of newsletters, the problem would
only be if you started training them as ham unless you really know they are
welcome.
--
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
WinError #99999: Out of error messages.
--
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Save the whales. Collect the whole set.