On Thu, 4 Sep 2014, Timothy Murphy wrote:

1) Is there a simple way of dumping email with an empty To: header?

If by "dump" you mean "discard", this simple test might be better done in your MTA. However, "poison pill" rules (absent certain DNSBLs) are generally discouraged.

This seems invariably to be spam, and I'm surprised SA doesn't seem
to score it highly.

Probably because even if it's a good spam sign, it isn't very common or it appears together with enough other spam signs that it's not scored very highly by itself.

If you post some spamples of such to pastebin we'll take a look.

Maybe it doesn't consider this to be a header?

Yes, it does. There are rules that check for no TO or CC. For example:

http://ruleqa.spamassassin.org/20140902-r1621946-n/REPLYTO_WITHOUT_TO_CC/detail

If you want to score for "no TO or CC header", you could do this:

  meta  NO_TO_CC   !__TOCC_EXISTS

2) Does "autolearn" actually remove spam with a very high score?
Or does it still get marked as spam by SA and passed on?

"autolearn" is submission of the message to the Bayes backend for training. This can affect the scoring of subsequently-scanned messages, but it does not affect the score of that message.

Also: SA does not directly have anything to do with the delivery process. All it does is generate a spamminess score. *Something else* has to interpret that score to decide the ultimate destination of the message: inbox, quarantine or bit bucket.

3) As will be obvious, I am not a student of SA;
I just use the default setting, which seems to work well enough for me.
But I'm a little surprised that more or less identical email
that I have marked as spam many times and passed through salearn
still seems to get through.

That would seem to indicate a problem with Bayes.

Is there a simple check to make sure salearn is working?

You will see BAYES_* rule hits on messages if Bayes is working. You have to learn a minimum number of spam *and* ham messages before it will start working.

This will report statistics about the Bayes database.

  /usr/bin/sa-learn --dump magic

The most common mistake is to train Bayes as a user that is not the same user that SA is running under to scan messages - i.e., you're training the wrong Bayes database. Check which user spamd is running under, and which user you're running sa-learn as. They should be the same user.

4) I haven't found a short and simple SA tutorial,
explaining how SA works,
with a few tests that one might add to the default,
and a couple of checks one could try to make sure it is working.

The definitive test to check whether SA is scanning messages is to send a message containing the GTUBE string, it should always be detected and score 1000 points. Google "spam GTUBE" for more details.


--
 John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
 jhar...@impsec.org    FALaholic #11174     pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
  The tree of freedom must be freshened from time to time
  with the blood of tyrants and tyrannosaurs.
                     -- DW, commenting on the GM6 Lynx .50BMG bullpup
-----------------------------------------------------------------------
 13 days until the 227th anniversary of the signing of the U.S. Constitution

Reply via email to