Re: BAYES_00 BODY. Negative score?

Matus UHLAR - fantomas Wed, 15 Feb 2023 06:27:15 -0800

On 15.02.23 14:53, hg user wrote:

If you run spamassasin with -D bayes -t xxx  2>debug.log


in debug.log you will see all the "tokens" the bayes system extracts
from the headers and you will probably find a lot of them related to
mailing lists.

If you teach SA that those tokens are spam and they are present both
in WP or Forbes, their emails will be flagged. It's normal.

Don't expect anyone to manually compare tokens, unless they are deeplydebugging bayes functionality.

Simply said, bayes DOES gather all possible tokens and compare theiroccurence with interesting effectivity - if you train Forbes and WPnewsletters as ham, and other newsletters as spam, bayes should be able todistinguish them quite nicely.

However, many of tokens in even Forbes and WP newsletters may occure indifferent spamy newsletters, so be careful when traning even these.

If you get the score down enough not to be classified as spam, you've wonand should not contine (unless you are willing to check all BAYES_0 mail forsuspicious newsletters and train those as spam, seeing how much it affectsmentioned Forbes and WP newsletters.


Bayes training is great, but one should be careful about that.

If you want you can use bayes_ignore_header to ignore some headers.


this rarely helps.

On 2/15/23, Matus UHLAR - fantomas <uh...@fantomas.sk> wrote:

*-1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1%
>*      [score: 0.0000]

This indicates a mistrained database, which means you have trained too
many
spams or spam-like messages (commercial messages) as ham.

Proper training of spams should help. Just keep your spam (and
optionally
ham) corpora for retraining in case you would drop the database.

I also recommend to abstain from training commercial mail (notices from
e-shops, companies you done business with etc) as ham, unless they
generate
BAYES_999 score and you want it lower.  I often train them as spam so
those
give uncertain BAYES_50 result.


On 14.02.23 23:05, Alex wrote:

Is there any ability to distinguish a legitimate newsletter from a spam
newsletter?


Very hard.

That's why I recommend not to train newsletters unless you know you/users
want them and they produce BAYES_99 result.

In other words, if I train emails from Forbes or Washington Post as ham,
then train similar newsletter emails from other other providers that are
more suspect, will bayes still be able to distinguish Forbes and WP as
ham?

The problem is that if I avoid training newsletters or bulk email
altogether, then I'm also left with spam newsletters still only hitting
bayes50.


If you only do this for Forbes or Washington Post, bayes will likely be able

to distinguish other newsletters, if you train those as spam.

I'm actually in a situation now where Forbes and WP newsletters are being
marked as spam, so considering retraining, but wondering what
approach/best
practices I should be following.


This should be safe. There are many types of newsletters, the problem would

only be if you started training them as ham unless you really know they are

welcome.

--
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
WinError #99999: Out of error messages.


--
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Save the whales. Collect the whole set.

Re: BAYES_00 BODY. Negative score?

Reply via email to