Ah, right, thanks for the nudge.

SO questions on this concept, again considering that we're using
subjectasmaillognames.

1) Wouldn't it be better to first remove files with the exact same subject
instead of just their age, say leaving 2 or 3 for variance, but if there's
10 of the same, delete all but 2 or 3?  As I understand it, this would help
keep the corpus diverse, by only deleting by age after duplicates are
removed.  I have code for this if you're interested.  Is it a good idea?
I'm no expert.

2) Wouldn't it be better to randomly delete every nth file before a rebuild
when the collection gets too big, instead of deleting by date with
MaxBayesFileAge?  Here's what I do:

a) First delete really old files.  I delete files older than 300 days for
spam and non-spam, and 600 days for error reports (since those seem more
accurate as human reported).  This generally doesn't clean up much, since
it's running every day, just a couple old files here and there.

b) Then after doing the delete same subject tasks (desribed in 1 above) go
through each folder and count the files. If it's over the max for a folder,
then calculate just how over it is as a percentage.  Then I cycle through
each file, and do a rand(100).  if the random number is less than the
percentage calculated, delete that file.  This doesn't consider age at all,
quite intentionally.  This will remove approximately enough files to keep
the folder at the max (could be more, could be less depending on rand.
Again, is this a good idea?  I have working code for this too.


Ken
On Fri, Sep 11, 2009 at 1:38 AM, Fritz Borgstedt <f...@iworld.de> wrote:

> ASSP development mailing list <assp-test@lists.sourceforge.net>
> schreibt:
> >can you kindly nudge me in the right direction?
>
> MaintBayesCollection : Maintenance for Bayesian Collection
>  Set this to on, if you want ASSP to run a maintenance tasks on the
> bayesian collection folders ( spamlog , notspamlog , correctedspam ,
> correctednotspam ). ASSP will delete the oldest files until the number
> of files per folder reaches MaxFiles. If you want ASSP to delete files
> because of there age instead of the number of files ( MaxFiles ),
> setup MaxBayesFileAge to your needs.
>  This option is usefull, if UseSubjectsAsMaillogNames is set to on
> and doMove2Num is set to off, because in this case the number of files
> in every collection folder will grow infinite.
>
> In V2 it is in section "collecting"
> In V1 it is in section "rebuildspamdb"
>
>
>
> ------------------------------------------------------------------------------
> Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
> trial. Simplify your report design, integration and deployment - and focus
> on
> what you do best, core application coding. Discover what's new with
> Crystal Reports now.  http://p.sf.net/sfu/bobj-july
> _______________________________________________
> Assp-test mailing list
> Assp-test@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/assp-test
>
------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Assp-test mailing list
Assp-test@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/assp-test

Reply via email to