On Mon, 14 Aug 2006 16:28:21 +0700, Beast <[EMAIL PROTECTED]> wrote: >Nigel Frankcom wrote: >> >>>> I will turn on auto leaarn mostly because I need to feed more HAM to SA >>>> (so far I only feed ham for any false positive which is very low daily >>>> and i think that is not good enough for SA) >>>> >>> If it is well trained then Bayes should be hitting. It may be that >>> SA cannot get to the Bayes database due to privileges. >>> >>> (I manually train here. I distrust automatic training.) >>> >>> {^_^} >>> >> >> I agree with not autotraining, imo it's a damned good way to get your >> bayes poisoned. With beast's error I got the impression only _some_ >> mails were being missed which would imply either a file lock issue or >> not enough child processes? >> >I also agree with your point, however I need to feed more HAM (not spam) >message, which is not easy to obtain, unless we dump all users mail to >one mailbox. > >For bayes file locking problem, I'm not quite sure because not complaint >in log: > >Aug 13 22:11:01 blowfish spampd[9828]: clean message ><[EMAIL PROTECTED]> (1.67/5.20) from ><[EMAIL PROTECTED]> for <[EMAIL PROTECTED]> in 0.33s, 2587 bytes. > >Yesterday, i was received 5 FN mails which are not have scanned by >bayes (low score), this for postmaster only, i'm not sure if its >applicable to other address also. > >--beast
A lot will depend on the circumstances your email servers run under and the terms & privacy options your site uses. Here it's not such an issue fortunately. I have an application that pulls mails out of the archive for our mailservers; then it's a case of finding either ham or specific spam to train in. You might try training in your own mailbox for ham; though with a large userbase ideally you want to train in a representative corpus of mail to all your users. Either way, it's going to involve some work (though significantly less work than clearing up after the spammers). I've found here that after the initial training run, just adding in reported FPs & FN's is sufficient to keep bayes accurate. This doesn't usually involve more than a few mails a month. Nigel