On Wed, 2009-01-28 at 22:36 +0000, RW wrote: > On Wed, 28 Jan 2009 22:02:59 +0100 > Karsten Bräckelmann <guent...@rudersport.de> wrote:
> > > On Wed, 28 Jan 2009, RW wrote: > > > > > > > I was wondering if it's possible to control autolearning based > > > > on rules. > > > > No. And even tweaking the various thresholds will not help, since > > auto-learning is based on the score *without* Bayes, etc. > > > > > > I'm scoring DSPAM into Spamassassin, and since DSPAM autolearns > > > > By that you mean... Using the DSPAM plugin for SA? And the rule you > > want to base auto-learning upon is the DSPAM plugin one? > > No, is there any point? Err, then I don't understand the "auto-learning from rules" in your Subject. What do you mean by that? > I just pass it though dspam and then score like this: [...] > I combine this with some sieve rules that file into Junk and Junk.high > folders at the scores 5 and 30. Junk.high is effectively discarded. I > check the Junk folder and move everything to the training folders, > along with any spam that gets through. Additionally a sieve rule > autofiles anything over 30 that dspam didn't get into the learn-spam > folder. > > That means that every single mail misclassified by dspam's > autolearning will get reclassified, but it doesn't imply the same for > Bayes unless Bayes autolearns in line with dspam. You got a special handling of mails dspam missed. So you are right, it won't do the same for SA Bayes, unless you get in some equivalent special handling for SA... > However, thinking about it a bit more, I think that the only real > problem is that ham that scores between 0.1 and 5.0 > wont be learned as ham, and I can fix that by moving the autolearn > threshold to up to 4.9. Eek! No, this is wrong and gives me the creeps. As I've mentioned before (hey, see your quote :), certain rules like Bayes will NOT be taken into account for the threshold. Also, scores used for auto-learning evaluation are using a non-Bayes score set. See the docs. http://spamassassin.apache.org/full/3.2.x/doc/Mail_SpamAssassin_Plugin_AutoLearnThreshold.html That means that a mail scoring your threshold PLUS the BAYES_00 score can be learned as ham. Possibly even much higher, since auto-learn uses score set 0 or 1... Think about this for a moment. AFAIK, there is no clean way of tricking SA into learning *everything* above and below a given threshold. Also, a certain gray area is better not learned automatically. Seriously. False learning *immediately* will have an impact on further results. Whereas learning after a manual re-view is slower, but not affected by bootstrapping even more FNs and FPs out of its own ass. > BTW am I correct in assumimg that my dspam header rules > in /usr/local/etc/mail/spamassassin/local.cf will contribute to > autolearning. Yes. Unless you set tflags noautolearn for your rules. See the above doc, and section Rule Definitions here: http://spamassassin.apache.org/full/3.2.x/doc/Mail_SpamAssassin_Conf.html guenther -- char *t="\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4"; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1: (c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}