I'll try to simplify my discussion a bit 1) It's my understanding that currently files are only deleted with subject logging on and move2numb off by date. Yes? I want to see random deletion in 0.4.14
2) We agree that deletion by date isn't the best for bayesian filtering yes? If so, then I want to keep the number of files closer to maxfile by first removing what is probably a duplicate email. Easiest way to do this that I've thought of: delete based on subject names. If we do this, we can remove what's probably the same message, and then delete randomly to get down to the maxfiles number of files. That'll leave more unique messages which is important since duplicates aren't considered by assp. 3) I'm confused by the MaintBayesCollection option. I use bayesian, I do NOT want the folders to have files removed automatically, oldest first to get to maxfiles. I want to do it by subject trimming first, then randomly. My point previously is that the description in admin for MaintBayesCollection suggests that files will be deleted by date. THis doesn't have anything to do with MaxNoBayesFileAge, etc does it? The max file age options say things like "A value of 0 disables this feature and no file will be deleted because of its age" but does this override the processing that the admin servers says will happen if maintbayescollection is checked? (deleting based on age to get to maxfiles) 4) You don't have the min option in ASSP now do you? I think that Brett and I are basically saying the same thing here. I like the TTL language, though min would be more consistent IMO. On Tue, Sep 15, 2009 at 1:31 PM, Thomas Eckardt/eck < thomas.ecka...@thockar.com> wrote: > I do not understand the discussion ! > > There are all wishes build in (assp) except removing mails with the same > subject - I do not love this idea, because the subject is ignored by > rebuildspamdb - only the body is used and mails with the same body are > ignored (except one) and will be deleted 60 days later . > > ------------------------------------------- > ['MaintBayesCollection','Maintenance for Bayesian > Collection',0,\&checkbox,'','(.*)',undef, > 'Set this to on, if you want ASSP to run a maintenance tasks on the > bayesian collection folders ( spamlog , notspamlog , correctedspam , > correctednotspam ). ASSP will delete the oldest files until the number of > files per folder reaches MaxFiles. If you want ASSP to delete files > because of their age instead of the number of files ( MaxFiles ), setup > MaxBayesFileAge and/or MaxCorrectedDays to your needs.<br /> > This option is usefull, if UseSubjectsAsMaillogNames is set to on and > doMove2Num is set to off, because in this case the number of files in > every collection folder will grow > infinite.',undef,undef,'msg006140','msg006141'], > > ['MaxBayesFileAge','Max Age of Bayes > Files',10,\&textinput,0,'(\d+)',undef, > 'The maximum file age in days of every file in every bayesian collection > folder ( spamlog , notspamlog ). If MaintBayesCollection is set to on and > a file is older than this number in days, the file will be deleted. > Default is 0. A value of 0 disables this feature and no file will be > deleted because of its age.<br /> > <span class = "negative">Do not define this option, if you use the > bayesian engine of ASSP. Deleting files because of there age, is wrong in > this case!!!!!</span>',undef,undef,'msg006150','msg006151'], > > ['MaxCorrectedDays','Max Corrected File > Age',5,\&textinput,'1000','(\d+)',undef,'This is the number of days a > error report will be kept in the correctednotspam and correctedspam > folders. These folders are the longterm memory of ASSP, therefore the > default is 1000 days. ',undef,undef,'msg008590','msg008591'], > > ['MaxNoBayesFileAge','Max Age of non Bayes > Files',10,\&textinput,0,'(\d+)',undef, > 'The maximum file age in days of every file in every non bayesian > collection folder ( incomingOkMail , discarded , viruslog ). If defined > and a file is older than this number in days, the file will be deleted. > Default is 0. A value of 0 disables this feature and no file will be > deleted because of its age.',undef,undef,'msg006160','msg006161'], > --------------------------------------------- > > If MaintBayesCollection is set to on -it is your choice to set the rest to > your needs. > > - MaxBayesFileAge/MaxNoBayesFileAge == 0 - reduce the number of > files to maxfiles by deleting the oldest > - MaxBayesFileAge/MaxNoBayesFileAge != 0 - reduce the number of > files by deleting all that are older than XX > > -MaxCorrectedDays - this files should never be deleted (use 1000000) > > And keep in mind - if the number of files per folder is reduced to > maxfiles at 1:00 AM and rebuildspamdb is running at 11:00 PM - > rebuildspamdb has to process possibly much more than maxfiles! > > Currently there is a mistake in this maint-task: the files with the > filedate set to 60 days in future, are the last files that will be deleted > - this will be fixed in 4.14 > > Thomas > > > > > > > "GrayHat" <gray...@gmx.net> > 15.09.2009 18:35 > Bitte antworten an > GrayHat <gray...@gmx.net>; Bitte antworten an > ASSP development mailing list <assp-test@lists.sourceforge.net> > > > An > "ASSP development mailing list" <assp-test@lists.sourceforge.net> > Kopie > > Thema > Re: [Assp-test] Antwort: Re: Antwort: Re: Antwort: Re:fixesandnewsin > 2.0.1_RC0.4.12 > > > > > > > >> Hmm... that sounds like an idea which was brought on some > >> time ago (John was still the dev for ASSP at the time); that > >> is, set up some kind of TTL parameter for corpus files so > >> that the spamdb rebuild should check the file date/time and > >> if over the TTL (say "n" days) it should then delete the file. > > > My thought is that the "TTL" would only be in effect for the purpose > > of keeping BlockReporting working (for however many days or > > weeks you wish the emails to be guaranteed resendable). > > After that time, the TTL is null and the files are game for > > replacement. I thought it a simple idea for working around > > the BlockReporting problem Thomas mentioned. > > I see, but there's no need to store something along with files, > the regular filesystem timestamp for each file will just work > fine, just remove all files if "(today - filetime) > TTL" > > > On a low-to-medium traffic box, though, this would not be a > > problem. We already deal with bunches of identical > > messages from time-to-time (nothing new). > > there may be a solution for that too, assuming the spam and > notspam folders gets cleaned up using the TTL, the files may > be saved using (e.g.) an MD5 hash (or the like) as the name > so that identical messages won't be stored more than one > time; by the way that may have some side effects and may > need some more thinking but... > > >> Bottom line; the bayes filter should work by /learning/ this > >> means that it should NOT discard the previous data, but > >> rather REFINE them from further data coming in; so maybe the > >> whole bayes approach used inside ASSP should be revised NOT > >> to deal just with the latest data but to learn/improve during time > > > Just an idea, but how do you "NOT" discard data while keeping > > rebuild times low and maintaining free hard drive space > > (realistically)? > > Using some kind of "digest" of the previous bases stored in a > more compact format > > > > > ------------------------------------------------------------------------------ > Come build with us! The BlackBerry® Developer Conference in SF, CA > is the only developer event you need to attend this year. Jumpstart your > developing skills, take BlackBerry mobile applications to market and stay > ahead of the curve. Join us from November 9-12, 2009. Register > now! > http://p.sf.net/sfu/devconf > _______________________________________________ > Assp-test mailing list > Assp-test@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/assp-test > > > > > DISCLAIMER: > ******************************************************* > This email and any files transmitted with it may be confidential, legally > privileged and protected in law and are intended solely for the use of the > > individual to whom it is addressed. > This email was multiple times scanned for viruses. There should be no > known virus in this email! > ******************************************************* > > > ------------------------------------------------------------------------------ > Come build with us! The BlackBerry® Developer Conference in SF, CA > is the only developer event you need to attend this year. Jumpstart your > developing skills, take BlackBerry mobile applications to market and stay > ahead of the curve. Join us from November 9-12, 2009. Register now! > http://p.sf.net/sfu/devconf > _______________________________________________ > Assp-test mailing list > Assp-test@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/assp-test > ------------------------------------------------------------------------------ Come build with us! The BlackBerry® Developer Conference in SF, CA is the only developer event you need to attend this year. Jumpstart your developing skills, take BlackBerry mobile applications to market and stay ahead of the curve. Join us from November 9-12, 2009. Register now! http://p.sf.net/sfu/devconf _______________________________________________ Assp-test mailing list Assp-test@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/assp-test