Oops, further investigation indicates that Bayes is "on"--thought the default was "off" for my config. I would be inclined to turn it off as I have no decent way of teaching it beyond mass-config into the future--please advise.

JP

On 10/17/10 10:37 PM, Jerry Pape wrote:
Wow, I am grateful for the prompt answers, but I must say they have confused me.

Bayes should not be on in my config and subsequent check of the GUI says its not--this may be wrong.

Further, what are the "scoreset" indexes?

I don't use Bayes because all of my clients are POP mail and they are neither smart|committed enough to mail back ham/spam to educate the system.

Additionally, when I used Bayes way back when (without manual population) and simply allowed auto-population to occur, I ended up with enormous .spamassassin sub-files that rapidly eclipsed 50% of the client's disk quota.

I am certain that I am missing critical configurational understanding and optimizations, but until your lot kindly educates me--it is what it is and my initial dilemma remains unresolved.

JP

On 10/17/10 7:01 PM, John Hardin wrote:
On Sun, 17 Oct 2010, Jerry Pape wrote:

[Not sure if this is the right place to send this--please correct me if I am in error]

This is the place.

Assessment of this header at http://www.futurequest.net/docs/SA/decode/ yields:

Test     Score     Description
BAYES_40     0.000     Bayesian spam probability is 20 to 40%
HTML_IMAGE_RATIO_02 0.550 HTML has a low ratio of text to image area
HTML_MESSAGE     0.001     HTML included in message
HTML_MIME_NO_HTML_TAG 1.052 HTML-only message, but there is no HTML tag
MIME_HTML_ONLY     1.672     Message only has text/html MIME parts
RDNS_NONE 0.100 Delivered to trusted network by a host with no rDNS
URIBL_BLACK     1.961     Contains an URL listed in the URIBL blacklist
Total:     5.336

Clearly 5.336 does not equal 3.8.

There are four score sets to choose from based on what options you have enabled. The above is for scoreset 2, no BAYES + net tests. Scoreset 3, BAYES + net tests, gives:

  HTML_MIME_NO_HTML_TAG  0.097
  MIME_HTML_ONLY_MULTI   0.001
  HTML_IMAGE_RATIO_02    0.383
  HTML_MESSAGE           0.001
  MIME_HTML_ONLY         1.457
  BAYES_40              -0.185
  URIBL_BLACK            1.955
  RDNS_NONE              0.1
                        -------
                         3.809

These are all of the default scores, and match what you're seeing.

I have no idea how to regress and resolve this problem.

First off, you need to review your Bayes training. An obviously spammy message shouldn't be hitting BAYES_40. Properly-trained Bayes, hitting BAYES_99, would have scored 7.494 on that message.

For analysis in general...

This will put the individual rule scores into the headers:

add_header all Status "_YESNO_, score=_SCORE_ required=_REQD_ tests=_TESTSSCORES_ autolearn=_AUTOLEARN_ version=_VERSION_"

"spamassassin --debug area=rules <test_msg_file" is often helpful.

However:

The nature of spam changes over time. 3.2, which is only getting critical bug fixes now, will become steadily less effective the more time passes and the spammers evolve new tricks. It's getting to the point that you should really consider upgrading to the latest 3.3 release.



Reply via email to