On Tue, 31 Mar 2009, John Hardin wrote:
On Tue, 31 Mar 2009, Lucio Chiappetti wrote:

>>  I suggest you also consider either disabling autolearn, or push the
>>  learn-as-ham threshold lower.
>
> I would be glad to do the latter, > Would that be one of those two in /usr/share/spamassassin/10_misc.cf ?
>
> bayes_auto_learn_threshold_nonspam      0.1
> bayes_auto_learn_threshold_spam         12.0

Yes. Try putting this in /etc/mail/spamassassin/local.cf:

   bayes_auto_learn_threshold_nonspam      -2

(That may be overdoing it a bit, considering I don't know how your ham scores generally run...)

Ah but that I know. I have a daily web log with messages ordered by score, sender and recipients (no subjects for privacy). I have (1 page is 30-40 messages) :

 3 pages with scores below -2
10 pages with scores -1 to -2
 6 pages with scores 0 to -1     (all this "code green")
 5 pages between  0 and 1
 4       between  1 and 2
 3       between  2 and 4        ("code yellow")
 half page between 4 and 4.5     ("code orange")
----------------------------
 1 page between 4.5 and 5        ("code pink", spam)
11 pages between 5 and 10        ("code red")
 6       between 10 and 12
16       above 12                ("code dark red")

Apparently (attention !) the green and most of the yellow look OK (they come from known users in an academic domain and go to a single local user or to a list of known collaborators) ... suspicious yellow ones are stuff coming from strange domains AND directed to a list of local users who usually do not work together.

... but I found some stuff with negative scores like -1.3 coming in groups, with an MX from funny mexican and brazilians domain, but a from faked to a local user (usually "from x to x", same as the recipient).
Could it be that our AWL got screwed too ?

And in fact I've just re-enabled my procmail trap of the latest kind of spam, so I could read the Received header, and verified in the mail log for "Passed CLEAN" (since we do not write the score for ham going through) and found a bunch coming from Poland (faked as local user) which were assigned score -1.3

Said that, do you confirm to use the ham threshold of -2 (that means ONLY the messages with a score < -2 will trigger bayes_00 or thereabout, doesn't it ?) ? And not also to lower the spam threshold of 12 (all messages above 6 are definitely spam) ? See also question below on AWL.

Additionally: isn't there any provision to trap messages from local users but not coming via the local mx's built in spamassassin ? (I have such a thing in procmail though I won't trust it in full)


Note that it won't have immediately obvious results; this is more of a long-term tuning change. You need to train those particular FN messages as spam to fix the problem you originally asked about.

I wonder if it would be better to reset everything from scratch.
And even resetting the AWL ...


In reply also to Karsten Brockelmann:

You should teach all your users to at least dump spam that slipped
through to the training spool.

we have many colourful expressions like "sweep the sea" or "wash the donkey's breast" to indicate a lost cause. The few knowledgeable users do it. The other won't (and they are the ones who complain).

Bad attitude. :)  You are catching these en mass with the procmail
recipe. Don't discard them, but rather dump them into a dedicated
folder.

my PERSONAL procmail rules are for my own use (and run at own risk). Anyhow it was easy to repoint the particular rule from /dev/null to the folder where I manually dump the suspect spam every day (just repointing a soft link). So I'll have a larger base for learning (and I've used it already in the tests above !)

Also, since you are able to write a procmail recipe for them, writing a custom SA rule is just as easy. Score it a point or two...

That I won't do. For two reasons. One is that I'm paid to do astrophysics and not to nurse my colleagues. The second is a matter of principle : one should not waste time "chasing after" any specific spam (I've done it already too many times for my own use) when there are tools like SA, DCC, Razor who should to that automatically. Except in very exceptional cases.


--
Lucio Chiappetti - INAF/IASF - via Bassini 15 - I-20133 Milano (Italy)
For more info : http://www.iasf-milano.inaf.it/~lucio/personal.html
-----------------------------------------------------------------------
"Nature" on government cuts to research       http://snipurl.com/4erid
"Nature" e i tagli del governo alla ricerca   http://snipurl.com/4erko

Reply via email to