Right, BUT if we're limiting the total number of files in the directory, wouldn't it be better to delete these duplicates to give a more diverse corpus?
We can look at their subject name and then delete them. This leave room for other files. Using subject logging and NOT move2numb, I guess I need some more clarification at this point. What method do you recommend using to keep the number of files down? Simply deleting by age isn't going to cut it if you want a diverse corpus is it? So what are your thoughts, Thomas, of my remove same subjects, remove really old by date, then remove a percentage (selecting randomly) based on the overage in each folder method? Note: in the MaxBayesFileAge, you've got: Do not define this option, if you use the bayesian engine of ASSP. Deleting files because of there age, is wrong in this case!!!!! It should be "their age." There's a bunch of other errors like this which I privately emailed to Fritz, on request. Should I send you that email too? THANKS On Sun, Sep 13, 2009 at 2:00 AM, Thomas Eckardt/eck < thomas.ecka...@thockar.com> wrote: > >What do you think about deleting redundant corpus emails > >based on the subject? > > Redundant corpus emails are skipped/deleted based on there content (md5 > hash). > > Thomas > > > > > K Post <nntp.p...@gmail.com> > 13.09.2009 03:28 > Bitte antworten an > ASSP development mailing list <assp-test@lists.sourceforge.net> > > > An > ASSP development mailing list <assp-test@lists.sourceforge.net> > Kopie > > Thema > Re: [Assp-test] fixes and news in 2.0.1_RC0.4.12 > > > > > > > This is great, and thanks SO much for adding my idea of the max days for > corrected spam. What do you think about deleting redundant corpus emails > based on the subject? > > On Sat, Sep 12, 2009 at 1:18 PM, Thomas Eckardt/eck < > thomas.ecka...@thockar.com> wrote: > > > Hi all, > > > > I'm back. > > > > fixed in 4.12: > > > > - for some messages the mail header was transfered two times > > - changing the display language was not working in any case > > - the hintbox in the config part of the GUI has shown wrong > > updated/changed values > > > > added in 4.12 > > > > MaxCorrectedDays > > msg008590=Max Corrected File Age > > msg008591=This is the number of days a error report will be kept in the > > correctednotspam and correctedspam folders. These folders are the > longterm > > memory of ASSP, therefore the default is 1000 days. > > > > changed in 4.12: > > > > - the change language part is moved to the main config form ! > > > > > > > > Thomas > > > > DISCLAIMER: > > ******************************************************* > > This email and any files transmitted with it may be confidential, > legally > > privileged and protected in law and are intended solely for the use of > the > > > > individual to whom it is addressed. > > This email was multiple times scanned for viruses. There should be no > > known virus in this email! > > ******************************************************* > > > > > > > > ------------------------------------------------------------------------------ > > Let Crystal Reports handle the reporting - Free Crystal Reports 2008 > 30-Day > > trial. Simplify your report design, integration and deployment - and > focus > > on > > what you do best, core application coding. Discover what's new with > > Crystal Reports now. http://p.sf.net/sfu/bobj-july > > _______________________________________________ > > Assp-test mailing list > > Assp-test@lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/assp-test > > > > ------------------------------------------------------------------------------ > Let Crystal Reports handle the reporting - Free Crystal Reports 2008 > 30-Day > trial. Simplify your report design, integration and deployment - and focus > on > what you do best, core application coding. Discover what's new with > Crystal Reports now. http://p.sf.net/sfu/bobj-july > _______________________________________________ > Assp-test mailing list > Assp-test@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/assp-test > > > > > DISCLAIMER: > ******************************************************* > This email and any files transmitted with it may be confidential, legally > privileged and protected in law and are intended solely for the use of the > > individual to whom it is addressed. > This email was multiple times scanned for viruses. There should be no > known virus in this email! > ******************************************************* > > > ------------------------------------------------------------------------------ > Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day > trial. Simplify your report design, integration and deployment - and focus > on > what you do best, core application coding. Discover what's new with > Crystal Reports now. http://p.sf.net/sfu/bobj-july > _______________________________________________ > Assp-test mailing list > Assp-test@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/assp-test > ------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ Assp-test mailing list Assp-test@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/assp-test