On 9/20/2016 9:46 AM, Thomas Barth wrote:
Am 20.09.2016 um 15:27 schrieb Bowie Bailey:
X-Spam-Status: Yes, score=14.009 tag=2 tag2=6.31 kill=6.31
tests=[HTML_MESSAGE=0.001, MESSAGEID_LOCAL=8,
MIME_HTML_ONLY=1.105,
PYZOR_CHECK=1.985, RCVD_IN_BRBL_LASTEXT=1.644, RDNS_NONE=1.274]
autolearn=no autolearn_force=no
The base SA ruleset is optimized to detect spam with a score of 5.0. If
you raise that score, you will allow more spam to come through. If you
lower that score, you will see more legitimate messages blocked as
spam. Make sure you know what you are doing before you change this
score.
I read that 5.0 is aggressive and suitable for single user setup,
conservative values are 8.0 or 11.0.
required_score n.nn (default: 5)
https://spamassassin.apache.org/full/3.2.x/doc/Mail_SpamAssassin_Conf.html
I ve checked most of the mails recognized as spam. The lowest score
was 8.6x so far.
Here is another mail from ...local. It definitely was spam with zip
attachment. Common is a sender address with digits.
<wynn.54...@allfromboats.com> -> <tba...@txbweb.de>, quarantine:
l/spam-lEHVGcheLkyq.gz, Message-ID:
<20160920202635.6b90ec7...@allfromboats.com.local>, mail_id:
lEHVGcheLkyq, Hits: 19.118
May be I also should block sender adresses with more than 2 digits in
the name?
My experience has been that spam scoring gets error-dominated pretty
rapidly outside the range near 5.0. That is to say, the difference in
actual spamminess between messages scored 4 and 6 is far more
predictable and significant than between -1 and 1, or 10 and 12. Even a
score of 8.0 I would expect to take months of tuning to get right,
between rescoring rules and RBLs appropriately and then giving the bayes
thresholds accurate scores on top of that. The furthest I would probably
go is 4.5 to 6.0. Outside that range, it's easy to run into
unpredictable "why was this spam blocked and that spam wasn't" scenarios.
Many of the stock published rules are scored by AI, which runs an
optimization problem to get the most spam on the right side of 5.0 and
the most ham on the left side. For the purposes of solving that problem,
the difference between a message scoring 4.8 and 4.9 is the same as the
difference between 4.0 and 4.9, or -50 and 4.9. Developers smooth out
the scoring curve by determining what rules the AI gets to score and for
how much, but that effect is strongest where we can quantify its
usefulness (near the default threshold).
Bayes is scored with a similar consideration, built around probability.