On 2025-02-14 at 17:00:03 UTC-0500 (Fri, 14 Feb 2025 17:00:03 -0500)
Alex <mysqlstud...@gmail.com>
is rumored to have said:
Hi,
I'm using SA v4 and trying to find ways to minimize the amount of junk
that
isn't tagged. Emails like "1-hour free consultation" or "buy this
event
list" or "salesforce optimization" or "HR consulting" that already hit
bayes99 (and bayes999) but are still just shy of 5 points.
Is there any benefit to training an email that's already hitting
bayes99?
Yes. The tokens which made it hit 99% are already doing their jobs, but
the rest of the message that Bayes isn't seeing as spammy may turn out
to be what makes the next spam hit 99.9%
Does it impact the txrep score?
Bayes learning does not.
TxRep (like AWL) is fed not by Bayes learning (sa-learn) but rather it
tracks the combination of an address and a source IP range (/24) with a
tally of the SA scores of messages using a subset of rules (no Bayes, no
TxRep/AWL). The TxRep or AWL DB can be seen and managed using the sa-awl
script and the listing action commands to the spamassassin script.
Because of that, when I want to "learn" spam I feed to BOTH 'sa-learn
--spam' AND 'spamassassin --add-to-blacklist' so that both databases are
taught.
I'd like to avoid creating rules for all of
these random junk, and obviously blocking the domain is futile.
B2B spam is very hard to catch. On the positive side, often you *can*
block a domain and end up rejecting its spam, even for years. You only
notice the ones you don't block.
I've also found a number of non-obvious commonalities between seemingly
distinct B2B spam sets: phone numbers, street addresses, Message-IDs,
distinctive headers, etc. Rules based on those sorts of things tend to
live longer than domain blocks and cover more mail.
These also aren't always one-offs, but maybe a dozen or twenty of each
over
a short period that get through, likely before the URIs are blocked
through
other means. Other times they don't have a link at all.
I have a pretty even balance of about 25k/20k ham/spam, so not sure
what
more can be done.
Ideas greatly appreciated.
--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo@toad.social and many *@billmail.scconsult.com
addresses)
Not Currently Available For Hire