Carlo Wood wrote: >On Wed, May 04, 2005 at 01:03:18PM -0400, Matt Kettler wrote: > > >>>In -well- every mail. That is not too weird, since >>>this is my domain! Why does rate 'alinoe.com' and 'com' >>>and 'carlo' as spammy tokens? Is that normal? >>> >>> >>> >>No, it's not normal. >> >>Have you been training your bayes using forwarded messages? >> >>In general it looks like your bayes has been very heavily trained on >>spam that was addressed To: you, and almost no nonspam messages >>addressed To: you. This is something that could happen if you were >>forwarding mail for training, or if you used someone elses nonspam for >>training (and little or none of your own), but did use your own spam. >> >> > >Yeah... the point is, I receive mail on my firewall machine. >There are no accounts there, but I want to run spamassassin >there so that it's cpu cycles don't bother me on my working >machine. However, I don't want the bayesian database to autolearn: >I want it to only learn correctly. So, I have auto-learn off. >The tagged mail is then sent to another machine that sorts it >into mailboxes with procmail. All mail is THERE decided to be >REALLY ham or spam (under my guidance) and is then forwarded >back to the firewall machine (two special accounts there) >which is then fed to the bayes. I didn't realize that this >didn't work. >
That should work the way you are doing it if you're careful. My warnings about forwarding were intended for those forwarding using a mail client's "forward" feature, which deletes the headers and creates new ones. However, you have to be a little careful to make sure you train both ham and spam. In general make sure the training ratio isn't too wildly off. 1:9 should be the worst ham:spam ratio you should use with this kind of setup. If you're training is 99% spam on the To:[EMAIL PROTECTED] address, then SA's bayes is going to assume that carlo gets nothing but spam, and all mail sent there will be biased a bit by this.