On Fri, Sep 30, 2005 at 09:14:53AM +0200, Kjetil Kjernsmo wrote: > On torsdag 29 september 2005, 21:51, Roberto C. Sanchez wrote: > > So, I finally decided to get with the 20th century and install > > spamassassin (acutally spampd hooked through postfix) to do site-wide > > spam filtering for my server. > > Yiiihaaa! > > > My question is this. As I am training > > it with sa-learn, is it (good|bad|indifferent) to train it on spam > > that has already been flagged as spam. That is, will this reinforce > > spamassassin's notion of spam or ruin it? > > No, that's fine. In fact, SA has this autowhitelist concept that does > exactly that (it's not really a whitelist, though, more an "evening out > weird things that may happen", I'm not using it). > > You should have a good look at bayes_ignore_header, so that it won't > train on things that are obviously in spam. SA is pretty good it this > itself, but if you see spam that has been filtered elsewhere a lot, be > sure to use it. > > I'm guessing that you, like me, are doing this for your family. In that > case, I have found that it is quite sufficient to train a single > database with the spam and ham of the entire family. If you have more > diverse users, you would probably need to have a per-user > configuration. For example, a friend of mine has an uncle who is a > psychiatrist working with people with gambling obsessions, and SA was > pretty catastrophic for him until he got a per-user config. > > Finally, I found that SA, in it's default 3.0-form was much too > conservative about the assigned scores, so I have a bunch of rules that > I have adjusted the score of. You'll get some experience about that in > time, I guess. Also note that SA 3.1 has been released upstream. > Cool. Thanks for the quick informative reply.
-Roberto -- Roberto C. Sanchez http://familiasanchez.net/~roberto
pgpCJxy0t0vCz.pgp
Description: PGP signature