Ah, right, thanks for the nudge. SO questions on this concept, again considering that we're using subjectasmaillognames.
1) Wouldn't it be better to first remove files with the exact same subject instead of just their age, say leaving 2 or 3 for variance, but if there's 10 of the same, delete all but 2 or 3? As I understand it, this would help keep the corpus diverse, by only deleting by age after duplicates are removed. I have code for this if you're interested. Is it a good idea? I'm no expert. 2) Wouldn't it be better to randomly delete every nth file before a rebuild when the collection gets too big, instead of deleting by date with MaxBayesFileAge? Here's what I do: a) First delete really old files. I delete files older than 300 days for spam and non-spam, and 600 days for error reports (since those seem more accurate as human reported). This generally doesn't clean up much, since it's running every day, just a couple old files here and there. b) Then after doing the delete same subject tasks (desribed in 1 above) go through each folder and count the files. If it's over the max for a folder, then calculate just how over it is as a percentage. Then I cycle through each file, and do a rand(100). if the random number is less than the percentage calculated, delete that file. This doesn't consider age at all, quite intentionally. This will remove approximately enough files to keep the folder at the max (could be more, could be less depending on rand. Again, is this a good idea? I have working code for this too. Ken On Fri, Sep 11, 2009 at 1:38 AM, Fritz Borgstedt <f...@iworld.de> wrote: > ASSP development mailing list <assp-test@lists.sourceforge.net> > schreibt: > >can you kindly nudge me in the right direction? > > MaintBayesCollection : Maintenance for Bayesian Collection > Set this to on, if you want ASSP to run a maintenance tasks on the > bayesian collection folders ( spamlog , notspamlog , correctedspam , > correctednotspam ). ASSP will delete the oldest files until the number > of files per folder reaches MaxFiles. If you want ASSP to delete files > because of there age instead of the number of files ( MaxFiles ), > setup MaxBayesFileAge to your needs. > This option is usefull, if UseSubjectsAsMaillogNames is set to on > and doMove2Num is set to off, because in this case the number of files > in every collection folder will grow infinite. > > In V2 it is in section "collecting" > In V1 it is in section "rebuildspamdb" > > > > ------------------------------------------------------------------------------ > Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day > trial. Simplify your report design, integration and deployment - and focus > on > what you do best, core application coding. Discover what's new with > Crystal Reports now. http://p.sf.net/sfu/bobj-july > _______________________________________________ > Assp-test mailing list > Assp-test@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/assp-test > ------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ Assp-test mailing list Assp-test@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/assp-test