Re: quirks with bayes ?

Lucio Chiappetti Tue, 31 Mar 2009 10:14:36 -0700

On Tue, 31 Mar 2009, John Hardin wrote:

On Tue, 31 Mar 2009, Lucio Chiappetti wrote:

>>  I suggest you also consider either disabling autolearn, or push the
>>  learn-as-ham threshold lower.
>
> I would be glad to do the latter,> Would that be one of those two in /usr/share/spamassassin/10_misc.cf ?
>
> bayes_auto_learn_threshold_nonspam      0.1
> bayes_auto_learn_threshold_spam         12.0

Yes. Try putting this in /etc/mail/spamassassin/local.cf:

   bayes_auto_learn_threshold_nonspam      -2
(That may be overdoing it a bit, considering I don't know how your hamscores generally run...)

Ah but that I know. I have a daily web log with messages ordered by score,sender and recipients (no subjects for privacy). I have (1 page is 30-40messages) :


 3 pages with scores below -2
10 pages with scores -1 to -2
 6 pages with scores 0 to -1     (all this "code green")
 5 pages between  0 and 1
 4       between  1 and 2
 3       between  2 and 4        ("code yellow")
 half page between 4 and 4.5     ("code orange")
----------------------------
 1 page between 4.5 and 5        ("code pink", spam)
11 pages between 5 and 10        ("code red")
 6       between 10 and 12
16       above 12                ("code dark red")

Apparently (attention !) the green and most of the yellow look OK (theycome from known users in an academic domain and go to a single local useror to a list of known collaborators) ... suspicious yellow ones are stuffcoming from strange domains AND directed to a list of local users whousually do not work together.

... but I found some stuff with negative scores like -1.3 coming ingroups, with an MX from funny mexican and brazilians domain, but a fromfaked to a local user (usually "from x to x", same as the recipient).

Could it be that our AWL got screwed too ?

And in fact I've just re-enabled my procmail trap of the latest kind ofspam, so I could read the Received header, and verified in the mail logfor "Passed CLEAN" (since we do not write the score for ham going through)and found a bunch coming from Poland (faked as local user) which wereassigned score -1.3

Said that, do you confirm to use the ham threshold of -2 (that means ONLYthe messages with a score < -2 will trigger bayes_00 or thereabout,doesn't it ?) ? And not also to lower the spam threshold of 12 (allmessages above 6 are definitely spam) ? See also question below on AWL.

Additionally: isn't there any provision to trap messages from local usersbut not coming via the local mx's built in spamassassin ? (I have such athing in procmail though I won't trust it in full)

Note that it won't have immediately obvious results; this is more of along-term tuning change. You need to train those particular FN messages asspam to fix the problem you originally asked about.


I wonder if it would be better to reset everything from scratch.
And even resetting the AWL ...


In reply also to Karsten Brockelmann:

You should teach all your users to at least dump spam that slipped
through to the training spool.

we have many colourful expressions like "sweep the sea" or "wash thedonkey's breast" to indicate a lost cause. The few knowledgeable usersdo it. The other won't (and they are the ones who complain).

Bad attitude. :)  You are catching these en mass with the procmail
recipe. Don't discard them, but rather dump them into a dedicated
folder.

my PERSONAL procmail rules are for my own use (and run at own risk).Anyhow it was easy to repoint the particular rule from /dev/null to thefolder where I manually dump the suspect spam every day (just repointing asoft link). So I'll have a larger base for learning (and I've used italready in the tests above !)

Also, since you are able to write a procmail recipe for them, writing acustom SA rule is just as easy. Score it a point or two...

That I won't do. For two reasons. One is that I'm paid to do astrophysicsand not to nurse my colleagues. The second is a matter of principle : oneshould not waste time "chasing after" any specific spam (I've done italready too many times for my own use) when there are tools like SA, DCC,Razor who should to that automatically. Except in very exceptional cases.



--
Lucio Chiappetti - INAF/IASF - via Bassini 15 - I-20133 Milano (Italy)
For more info : http://www.iasf-milano.inaf.it/~lucio/personal.html
-----------------------------------------------------------------------
"Nature" on government cuts to research       http://snipurl.com/4erid
"Nature" e i tagli del governo alla ricerca   http://snipurl.com/4erko

Re: quirks with bayes ?

Reply via email to