On Fri, 12 Dec 2008, Marcin Krol wrote:

John Hardin wrote:
 On Thu, 11 Dec 2008, Karsten Br�ckelmann wrote:

>  I still recommend initial training, to give Bayes a good kick-start.

 Initial _manual_ training.

Define manual: manual picking out spams is plain too labor-intensive.

Manual training of the initial corpus is 200 hams and 200 spams. That's not excessive.

Past that point the decision to continue to use manual training or add or completely switch to autotraining is the admin's preference, based on volume.

I manually train the few domains I host and manage for myself and family members and friends, and get good results. When I was administering a 100-user network manual training was not a burden. I can't speak for someone administering a 1000- or 10,000- or 100,000-user network.

I do have some distrust of autolearn given the complaints I've seen here that can be laid at its feet (but note, the successful users are understandably not complaining, so that impression is no doubt unfairly biased). I just like the idea of human judgement in the loop. A middle ground could be user spam- and ham-training folders, with manual review before feeding the messages to sa-learn.

But autolearn should _not_ be trusted for initial training. That will simply magnify small errors in the initial configuration, rather than helping to correct them. That's our point.

--
 John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
 jhar...@impsec.org    FALaholic #11174     pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
  Government cannot grant rights. Government can only limit, infringe
  or suppress rights.
-----------------------------------------------------------------------
 3 days until Bill of Rights day

Reply via email to