he should not compare all the tokens but a rapid survey on the tokens
derived from headers can tell him how the bayes result was formed.

A couple of weeks ago some phishing reached our inboxes. Our custom rule
gave the message 5 points but I was surprised that the message was
categorized  BAYES_00, -1.9.

I run the bayes debug and found that clearly spam words were not recognized
as spammy. Then I discovered that one admin enable auto-learning by mistake
and the database was full of garbage...

I cleared the db, reloaded it with our hand-selected corpus and the message
was now BAYES_50.



On Wed, Feb 15, 2023 at 3:27 PM Matus UHLAR - fantomas <uh...@fantomas.sk>
wrote:

> On 15.02.23 14:53, hg user wrote:
> >If you run spamassasin with -D bayes -t xxx  2>debug.log
> >
> >in debug.log you will see all the "tokens" the bayes system extracts
> >from the headers and you will probably find a lot of them related to
> >mailing lists.
> >
> >If you teach SA that those tokens are spam and they are present both
> >in WP or Forbes, their emails will be flagged. It's normal.
>
> Don't expect anyone to manually compare tokens, unless they are deeply
> debugging bayes functionality.
>
> Simply said, bayes DOES gather all possible tokens and compare their
> occurence with interesting effectivity - if you train Forbes and WP
> newsletters as ham, and other newsletters as spam, bayes should be able to
> distinguish them quite nicely.
>
> However, many of tokens in even Forbes and WP newsletters may occure in
> different spamy newsletters, so be careful when traning even these.
>
> If you get the score down enough not to be classified as spam, you've won
> and should not contine (unless you are willing to check all BAYES_0 mail
> for
> suspicious newsletters and train those as spam, seeing how much it affects
> mentioned Forbes and WP newsletters.
>
> Bayes training is great, but one should be careful about that.
>
>
> >If you want you can use bayes_ignore_header to ignore some headers.
>
> this rarely helps.
>
>
> >On 2/15/23, Matus UHLAR - fantomas <uh...@fantomas.sk> wrote:
> >>>>*-1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1%
> >>>> >*      [score: 0.0000]
> >>>>
> >>>> This indicates a mistrained database, which means you have trained too
> >>>> many
> >>>> spams or spam-like messages (commercial messages) as ham.
> >>>>
> >>>> Proper training of spams should help. Just keep your spam (and
> >>>> optionally
> >>>> ham) corpora for retraining in case you would drop the database.
> >>>>
> >>>> I also recommend to abstain from training commercial mail (notices
> from
> >>>> e-shops, companies you done business with etc) as ham, unless they
> >>>> generate
> >>>> BAYES_999 score and you want it lower.  I often train them as spam so
> >>>> those
> >>>> give uncertain BAYES_50 result.
> >>
> >> On 14.02.23 23:05, Alex wrote:
> >>>Is there any ability to distinguish a legitimate newsletter from a spam
> >>>newsletter?
> >>
> >> Very hard.
> >>
> >> That's why I recommend not to train newsletters unless you know
> you/users
> >> want them and they produce BAYES_99 result.
> >>
> >>
> >>>In other words, if I train emails from Forbes or Washington Post as ham,
> >>>then train similar newsletter emails from other other providers that are
> >>>more suspect, will bayes still be able to distinguish Forbes and WP as
> >>> ham?
> >>
> >>>The problem is that if I avoid training newsletters or bulk email
> >>>altogether, then I'm also left with spam newsletters still only hitting
> >>>bayes50.
> >>
> >> If you only do this for Forbes or Washington Post, bayes will likely be
> able
> >>
> >> to distinguish other newsletters, if you train those as spam.
> >>
> >>>I'm actually in a situation now where Forbes and WP newsletters are
> being
> >>>marked as spam, so considering retraining, but wondering what
> >>> approach/best
> >>>practices I should be following.
> >>
> >> This should be safe. There are many types of newsletters, the problem
> would
> >>
> >> only be if you started training them as ham unless you really know they
> are
> >>
> >> welcome.
> >>
> >> --
> >> Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
> >> Warning: I wish NOT to receive e-mail advertising to this address.
> >> Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
> >> WinError #99999: Out of error messages.
> >>
>
> --
> Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
> Warning: I wish NOT to receive e-mail advertising to this address.
> Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
> Save the whales. Collect the whole set.
>

Reply via email to