On Thu, 2 Nov 2006, itdelany wrote:

> I successfully processed ham and spam emails with sa-learn, throught spam
> and ham mail accounts, now, i will wait for users to send me new spam
> messages to rich the bayesian filter.
> What is the best to do with the old processed spam messages? deleted them o
> re-apply the learn on them with the new messages?

It depends on the size and whether you are doing purely manual
training.

I believe in keeping them around (though aged or saved in an archive
directory, so that it doesn't try to re-learn them every time) in case
I need to retrain from scratch for some reason.

My nightly learning script (posted here, check the archives) ignores
message files that haven't been modified in the last three days, and
I rotate the files where users save messages-to-be-learned monthly, so
that at most sa-learn only examines one month of messages per user,
regardless of how large the corpus gets.

'course, I only have four users...

--
 John Hardin KA7OHZ    ICQ#15735746    http://www.impsec.org/~jhardin/
 [EMAIL PROTECTED]    FALaholic #11174    pgpk -a [EMAIL PROTECTED]
 key: 0xB8732E79 - 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
  The first time I saw a bagpipe, I thought the player was torturing
  an octopus. I was amazed they could scream so loudly.
                                        -- cat_herder_5263 on Y! SCOX
-----------------------------------------------------------------------
 5 days until the campaign ads stop

Reply via email to