Re: bayes/txrep questions

Alex Sun, 16 Feb 2025 07:40:37 -0800

>
>
>
> > Is there any benefit to training an email that's already hitting
> > bayes99?
>
> Yes. The tokens which made it hit 99% are already doing their jobs, but
> the rest of the message that Bayes isn't seeing as spammy may turn out
> to be what makes the next spam hit 99.9%
>


I have noticed that after training these messages it does still report it
learned at least something from them.


> > Does it impact the txrep score?
>
> Bayes learning does not.
>
> TxRep (like AWL) is fed not by Bayes learning (sa-learn) but rather it
> tracks the combination of an address and a source IP range (/24) with a
> tally of the SA scores of messages using a subset of rules (no Bayes, no
> TxRep/AWL). The TxRep or AWL DB can be seen and managed using the sa-awl
> script and the listing action commands to the spamassassin script.
>
> Because of that, when I want to "learn" spam I feed to BOTH 'sa-learn
> --spam' AND 'spamassassin --add-to-blacklist' so that both databases are
> taught.
>

Okay, very interesting. It's not entirely clear from the description

     --add-to-blocklist                Add addresses in mail to persistent
address blocklist

Is there a ham equivalent?

      -W, --add-to-welcomelist          Add addresses in mail to persistent
address welcomelist

>
> > I'd like to avoid creating rules for all of
> > these random junk, and obviously blocking the domain is futile.
>
> B2B spam is very hard to catch. On the positive side, often you *can*
> block a domain and end up rejecting its spam, even for years. You only
> notice the ones you don't block.
>
> I've also found a number of non-obvious commonalities between seemingly
> distinct B2B spam sets: phone numbers, street addresses, Message-IDs,
> distinctive headers, etc. Rules based on those sorts of things tend to
> live longer than domain blocks and cover more mail.
>

Yes, I've been doing that pretty extensively (like what John mentioned,
too), but it's very laborious, obviously.

Here's a simple example of what I've been doing with "outsource HR
services" junk.

body    __HRFRAUD0  /HR Solutions Consultant/i
body    __HRFRAUD1  /HR management/i
body    __HRFRAUD2  /payroll|benefits|retirement plans/i
meta    HRFRAUD     (__HRFRAUD0 + __HRFRAUD1 + __HRFRAUD2 >= 3) &&
((!BAYES_00 && !BAYES_05) || FREEMAIL_FROM)
score   HRFRAUD     0.50
describe HRFRAUD    HR services

Sometimes I also require a specific keyword, like "HR services" to even
further reduce false positives.

Re: bayes/txrep questions

Reply via email to