On Thu, 2 Nov 2006, itdelany wrote: > I successfully processed ham and spam emails with sa-learn, throught spam > and ham mail accounts, now, i will wait for users to send me new spam > messages to rich the bayesian filter. > What is the best to do with the old processed spam messages? deleted them o > re-apply the learn on them with the new messages?
It depends on the size and whether you are doing purely manual training. I believe in keeping them around (though aged or saved in an archive directory, so that it doesn't try to re-learn them every time) in case I need to retrain from scratch for some reason. My nightly learning script (posted here, check the archives) ignores message files that haven't been modified in the last three days, and I rotate the files where users save messages-to-be-learned monthly, so that at most sa-learn only examines one month of messages per user, regardless of how large the corpus gets. 'course, I only have four users... -- John Hardin KA7OHZ ICQ#15735746 http://www.impsec.org/~jhardin/ [EMAIL PROTECTED] FALaholic #11174 pgpk -a [EMAIL PROTECTED] key: 0xB8732E79 - 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 ----------------------------------------------------------------------- The first time I saw a bagpipe, I thought the player was torturing an octopus. I was amazed they could scream so loudly. -- cat_herder_5263 on Y! SCOX ----------------------------------------------------------------------- 5 days until the campaign ads stop