On Tue, 13 Feb 2018, Horváth Szabolcs wrote:
After:
pts rule name description
---- ---------------------- --------------------------------------------------
0.0 HTML_IMAGE_RATIO_08 BODY: HTML has a low ratio of text to image area
0.0 HTML_MESSAGE BODY: HTML included in message
0.8 BAYES_50 BODY: Bayes spam probability is 40 to 60%
[score: 0.5000]
BAYES_50 is "can't decide".
Version: spamassassin-3.3.2-4.el6.rfx.x86_64
$ sa-learn --dump magic --dbpath /var/spool/amavisd/.spamassassin/
0.000 0 3 0 non-token data: bayes db version
0.000 0 338770 0 non-token data: nspam
0.000 0 1460807 0 non-token data: nham
That ratio is really suspicious. I'd expect something closer to 1:1 or
even a bit heavier on spam.
It *seems* that you have spam trained as ham; that would explain BAYES_50
with that much in the BAYES database.
My questions are:
1) is there any chance to change spamassassin settings to mark similar messages
as SPAM in the future?
bayes_50 with 0.8 points are really-really low.
No, it's not. "BAYES_50" is "I can't decide" and increasing the score for
that implies "I can't decide" means "spam". That's not justified.
Don't adjust the score of BAYES_50.
It's recommended (if possible) to retain the training corpora so that it
can be reviewed and retrained from scratch if necessary.
Your admin is manually vetting user-submitted training messages. Are they
retained after being trained?
You might consider reviewing the training corpus and retraining Bayes from
scratch.
Another note: the "before" result:
Before: spamassassin -D -t <spam
Content analysis details: (0.0 points, 5.0 required)
pts rule name description
----------------------------------------------------------------------------
0.0 HTML_IMAGE_RATIO_08 BODY: HTML has a low ratio of text to image area
0.0 HTML_MESSAGE BODY: HTML included in message
Content analysis details: (0.8 points, 5.0 required)
...with *no* BAYES hits at all (not even BAYES_50) suggests your SA is
*not* using the database whose statistics you reported above.
First: verify which Bayes database your SA install is using, and that it
is the one you're training into and getting those stats from.
--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhar...@impsec.org FALaholic #11174 pgpk -a jhar...@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
Maxim IX: Never turn your back on an enemy.
-----------------------------------------------------------------------
9 days until George Washington's 286th Birthday