Jim Ford writes:
>On Tue, Jul 08, 2003 at 03:09:47PM -0500, Thomas Cameron wrote:
>> If you have one message in the caught-spam folder and run sa-learn on it, it
>> will report it learned from one message.  If you leave that message in there
>> and add a second, sa-learn will only learn from the second, since it's
>> already read ther first one.
>
>I reckon you're right, but I usually let the spam accumulate before I run
>sa-learn.
>
>Whilst we're on the subject of sa-learn, I'm always feeding spam to
>sa-learn, but neglectful in feeding it ham. I probably aught to do it more
>often because it might distort the database in some way. Comments, anyone?

Yeah, you should try to keep the levels roughly even -- Bayesian
classifiers have a general problem with out-of-kilter levels,
as discussed in this paper:

  http://www.ai.mit.edu/~jrennie/papers/icml03-nb.pdf

  (that's _Tackling the Poor Assumptions of Naive Bayes Text Classifiers_
  Jason D. M. Rennie, Lawrence Shih, Jaime Teevan and David R. Karger,
  Proceedings of the Twentieth International Conference on Machine Learning, 2003)

I'm not quite sure what the results are, but I think you'd see a
higher tendency towards spam classifications, overall.

--j.


-------------------------------------------------------
This SF.Net email sponsored by: Parasoft
Error proof Web apps, automate testing & more.
Download & eval WebKing and get a free book.
www.parasoft.com/bulletproofapps
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to