> If I understand correctly, you could then add a "minimum life" variable > that says the file has to be older than so many days before it can be > deleted. If the file is not older than "minimum life", it is left until > it is older...then deleted. Or something like that. So, there would be > some extra files in the collections, but I don't imagine they would hurt > bayesian too badly seeing as that's how it already is today.
Hmm... that sounds like an idea which was brought on some time ago (John was still the dev for ASSP at the time); that is, set up some kind of TTL parameter for corpus files so that the spamdb rebuild should check the file date/time and if over the TTL (say "n" days) it should then delete the file. While at first it may sound like a "cool idea", it has some drawbacks, especially when it comes to low and high traffic boxes; in the first case the spam/notspam folder would quickly "age" and get almost empty; in the second one is that on a high traffic box it would then be easy to corrupt the corpus by sending in a bunch of identical messages :P Bottom line; the bayes filter should work by /learning/ this means that it should NOT discard the previous data, but rather REFINE them from further data coming in; so maybe the whole bayes approach used inside ASSP should be revised NOT to deal just with the latest data but to learn/improve during time ------------------------------------------------------------------------------ Come build with us! The BlackBerry® Developer Conference in SF, CA is the only developer event you need to attend this year. Jumpstart your developing skills, take BlackBerry mobile applications to market and stay ahead of the curve. Join us from November 9-12, 2009. Register now! http://p.sf.net/sfu/devconf _______________________________________________ Assp-test mailing list Assp-test@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/assp-test