> > > > > Is there any benefit to training an email that's already hitting > > bayes99? > > Yes. The tokens which made it hit 99% are already doing their jobs, but > the rest of the message that Bayes isn't seeing as spammy may turn out > to be what makes the next spam hit 99.9% >
I have noticed that after training these messages it does still report it learned at least something from them. > > Does it impact the txrep score? > > Bayes learning does not. > > TxRep (like AWL) is fed not by Bayes learning (sa-learn) but rather it > tracks the combination of an address and a source IP range (/24) with a > tally of the SA scores of messages using a subset of rules (no Bayes, no > TxRep/AWL). The TxRep or AWL DB can be seen and managed using the sa-awl > script and the listing action commands to the spamassassin script. > > Because of that, when I want to "learn" spam I feed to BOTH 'sa-learn > --spam' AND 'spamassassin --add-to-blacklist' so that both databases are > taught. > Okay, very interesting. It's not entirely clear from the description --add-to-blocklist Add addresses in mail to persistent address blocklist Is there a ham equivalent? -W, --add-to-welcomelist Add addresses in mail to persistent address welcomelist > > > I'd like to avoid creating rules for all of > > these random junk, and obviously blocking the domain is futile. > > B2B spam is very hard to catch. On the positive side, often you *can* > block a domain and end up rejecting its spam, even for years. You only > notice the ones you don't block. > > I've also found a number of non-obvious commonalities between seemingly > distinct B2B spam sets: phone numbers, street addresses, Message-IDs, > distinctive headers, etc. Rules based on those sorts of things tend to > live longer than domain blocks and cover more mail. > Yes, I've been doing that pretty extensively (like what John mentioned, too), but it's very laborious, obviously. Here's a simple example of what I've been doing with "outsource HR services" junk. body __HRFRAUD0 /HR Solutions Consultant/i body __HRFRAUD1 /HR management/i body __HRFRAUD2 /payroll|benefits|retirement plans/i meta HRFRAUD (__HRFRAUD0 + __HRFRAUD1 + __HRFRAUD2 >= 3) && ((!BAYES_00 && !BAYES_05) || FREEMAIL_FROM) score HRFRAUD 0.50 describe HRFRAUD HR services Sometimes I also require a specific keyword, like "HR services" to even further reduce false positives.