On Tue, 31 Mar 2009, John Hardin wrote:
On Tue, 31 Mar 2009, Lucio Chiappetti wrote:
>> I suggest you also consider either disabling autolearn, or push the
>> learn-as-ham threshold lower.
>
> I would be glad to do the latter,
> Would that be one of those two in /usr/share/spamassassin/10_misc.cf ?
>
> bayes_auto_learn_threshold_nonspam 0.1
> bayes_auto_learn_threshold_spam 12.0
Yes. Try putting this in /etc/mail/spamassassin/local.cf:
bayes_auto_learn_threshold_nonspam -2
(That may be overdoing it a bit, considering I don't know how your ham
scores generally run...)
Ah but that I know. I have a daily web log with messages ordered by score,
sender and recipients (no subjects for privacy). I have (1 page is 30-40
messages) :
3 pages with scores below -2
10 pages with scores -1 to -2
6 pages with scores 0 to -1 (all this "code green")
5 pages between 0 and 1
4 between 1 and 2
3 between 2 and 4 ("code yellow")
half page between 4 and 4.5 ("code orange")
----------------------------
1 page between 4.5 and 5 ("code pink", spam)
11 pages between 5 and 10 ("code red")
6 between 10 and 12
16 above 12 ("code dark red")
Apparently (attention !) the green and most of the yellow look OK (they
come from known users in an academic domain and go to a single local user
or to a list of known collaborators) ... suspicious yellow ones are stuff
coming from strange domains AND directed to a list of local users who
usually do not work together.
... but I found some stuff with negative scores like -1.3 coming in
groups, with an MX from funny mexican and brazilians domain, but a from
faked to a local user (usually "from x to x", same as the recipient).
Could it be that our AWL got screwed too ?
And in fact I've just re-enabled my procmail trap of the latest kind of
spam, so I could read the Received header, and verified in the mail log
for "Passed CLEAN" (since we do not write the score for ham going through)
and found a bunch coming from Poland (faked as local user) which were
assigned score -1.3
Said that, do you confirm to use the ham threshold of -2 (that means ONLY
the messages with a score < -2 will trigger bayes_00 or thereabout,
doesn't it ?) ? And not also to lower the spam threshold of 12 (all
messages above 6 are definitely spam) ? See also question below on AWL.
Additionally: isn't there any provision to trap messages from local users
but not coming via the local mx's built in spamassassin ? (I have such a
thing in procmail though I won't trust it in full)
Note that it won't have immediately obvious results; this is more of a
long-term tuning change. You need to train those particular FN messages as
spam to fix the problem you originally asked about.
I wonder if it would be better to reset everything from scratch.
And even resetting the AWL ...
In reply also to Karsten Brockelmann:
You should teach all your users to at least dump spam that slipped
through to the training spool.
we have many colourful expressions like "sweep the sea" or "wash the
donkey's breast" to indicate a lost cause. The few knowledgeable users
do it. The other won't (and they are the ones who complain).
Bad attitude. :) You are catching these en mass with the procmail
recipe. Don't discard them, but rather dump them into a dedicated
folder.
my PERSONAL procmail rules are for my own use (and run at own risk).
Anyhow it was easy to repoint the particular rule from /dev/null to the
folder where I manually dump the suspect spam every day (just repointing a
soft link). So I'll have a larger base for learning (and I've used it
already in the tests above !)
Also, since you are able to write a procmail recipe for them, writing a
custom SA rule is just as easy. Score it a point or two...
That I won't do. For two reasons. One is that I'm paid to do astrophysics
and not to nurse my colleagues. The second is a matter of principle : one
should not waste time "chasing after" any specific spam (I've done it
already too many times for my own use) when there are tools like SA, DCC,
Razor who should to that automatically. Except in very exceptional cases.
--
Lucio Chiappetti - INAF/IASF - via Bassini 15 - I-20133 Milano (Italy)
For more info : http://www.iasf-milano.inaf.it/~lucio/personal.html
-----------------------------------------------------------------------
"Nature" on government cuts to research http://snipurl.com/4erid
"Nature" e i tagli del governo alla ricerca http://snipurl.com/4erko