On Tue, 13 Feb 2018, Horváth Szabolcs wrote:

After:

pts rule name              description
---- ---------------------- --------------------------------------------------
0.0 HTML_IMAGE_RATIO_08    BODY: HTML has a low ratio of text to image area
0.0 HTML_MESSAGE           BODY: HTML included in message
0.8 BAYES_50               BODY: Bayes spam probability is 40 to 60%
                           [score: 0.5000]

BAYES_50 is "can't decide".


Version: spamassassin-3.3.2-4.el6.rfx.x86_64

$ sa-learn --dump magic --dbpath /var/spool/amavisd/.spamassassin/
0.000          0          3          0  non-token data: bayes db version
0.000          0     338770          0  non-token data: nspam
0.000          0    1460807          0  non-token data: nham

That ratio is really suspicious. I'd expect something closer to 1:1 or even a bit heavier on spam.

It *seems* that you have spam trained as ham; that would explain BAYES_50 with that much in the BAYES database.

My questions are:
1) is there any chance to change spamassassin settings to mark similar messages 
as SPAM in the future?
bayes_50 with 0.8 points are really-really low.

No, it's not. "BAYES_50" is "I can't decide" and increasing the score for that implies "I can't decide" means "spam". That's not justified.

Don't adjust the score of BAYES_50.

It's recommended (if possible) to retain the training corpora so that it can be reviewed and retrained from scratch if necessary.

Your admin is manually vetting user-submitted training messages. Are they retained after being trained?

You might consider reviewing the training corpus and retraining Bayes from scratch.


Another note: the "before" result:

Before: spamassassin -D -t <spam

Content analysis details:   (0.0 points, 5.0 required)

 pts rule name              description 
----------------------------------------------------------------------------
 0.0 HTML_IMAGE_RATIO_08    BODY: HTML has a low ratio of text to image area
 0.0 HTML_MESSAGE           BODY: HTML included in message

Content analysis details:   (0.8 points, 5.0 required)

...with *no* BAYES hits at all (not even BAYES_50) suggests your SA is *not* using the database whose statistics you reported above.

First: verify which Bayes database your SA install is using, and that it is the one you're training into and getting those stats from.


--
 John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
 jhar...@impsec.org    FALaholic #11174     pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
  Maxim IX: Never turn your back on an enemy.
-----------------------------------------------------------------------
 9 days until George Washington's 286th Birthday

Reply via email to