Carlo Wood wrote:

>On Wed, May 04, 2005 at 01:03:18PM -0400, Matt Kettler wrote:
>  
>
>>>In -well- every mail.  That is not too weird, since
>>>this is my domain!  Why does rate 'alinoe.com' and 'com'
>>>and 'carlo' as spammy tokens?  Is that normal?
>>>
>>>      
>>>
>>No, it's not normal.
>>
>>Have you been training your bayes using forwarded messages? 
>>
>>In general it looks like your bayes has been very heavily trained on
>>spam that was addressed To: you, and almost no nonspam messages
>>addressed To: you. This is something that could happen if you were
>>forwarding mail for training, or if you used someone elses nonspam for
>>training (and little or none of your own), but did use your own spam.
>>    
>>
>
>Yeah... the point is, I receive mail on my firewall machine.
>There are no accounts there, but I want to run spamassassin
>there so that it's cpu cycles don't bother me on my working
>machine.  However, I don't want the bayesian database to autolearn:
>I want it to only learn correctly.  So, I have auto-learn off.
>The tagged mail is then sent to another machine that sorts it
>into mailboxes with procmail.  All mail is THERE decided to be
>REALLY ham or spam (under my guidance) and is then forwarded
>back to the firewall machine (two special accounts there)
>which is then fed to the bayes.  I didn't realize that this
>didn't work.
>

That should work the way you are doing it if you're careful. My warnings
about forwarding were intended for those forwarding using a mail
client's "forward" feature, which deletes the headers and creates new ones.

However, you have to be a little careful to make sure you train both ham
and spam. In general make sure the training ratio isn't too wildly off.
1:9 should be the worst ham:spam ratio you should use with this kind of
setup.

If you're training is 99% spam on the To:[EMAIL PROTECTED] address, then
SA's bayes is going to assume that carlo gets nothing but spam, and all
mail sent there will be biased a bit by this.




Reply via email to