Re: Text contained in HTML comments causing BAYES_00 to classify as non-spam

Karsten Bräckelmann Wed, 04 Aug 2010 14:54:37 -0700

On Wed, 2010-08-04 at 14:39 -0700, Happy Chap wrote:
> Bowie Bailey wrote:


> > Stupid question here, but are you sure you are training the same
> > database that SA is using?
> > 
> > This is a fairly frequent problem.  Common cases are:
> > 
> > 1) SA being called as 'mailuser' and you are doing manual training on
> > root's database.
> > 2) You are manually training everything to the 'mailuser' database, but
> > SA is actually using per-user databases.
> 
> Good question Bowie. 
> 
> I don't think that's happening. We do have a generic system-wide procmailrc
> but it's first command is for a DROPPRIVS, which I think/thought then runs
> as the specific user and in the procmail recipe a call is then made to spamc
> (although it is called without the -u option because, as I say, I think by
> issuing a DROPPRIVS it's running as that user so -u shouldn't be necessary).

*nod*

> If this doesn't sound right, by all means say - it's quite a while since i
> set all this up!
> 
> Training is definitely happening on a per user basis (ie. the script is
> calling sa-learn -u).

So when you confirmed by running sa-learn --dump magic previously, did
you first su to the user in question? The Bayes database does exist in
the user's $HOME/.spamassassin/, right?

Despite running per-user, site-wide Bayes DB still is possible IIRC, if
you e.g. use an SQL backend.


Anyway, since you still get BAYES_00 on these, you really should have a
close look at the tokens Bayes considers most confident. And why. With
some training, it most certainly at least should level up near BAYES_50,
not stay at 00. The tokens should help tell you why.


-- 
char *t="\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}

Re: Text contained in HTML comments causing BAYES_00 to classify as non-spam

Reply via email to