>What method do you recommend using to keep the number of files down? 
Simply
>deleting by age isn't going to cut it if you want a diverse corpus is it?

The best way is:

- do not use subject for filenames or use move2num
- never delete any file by age
- set maxfiles high enough to get a good corpus

This is the way assp works for years. The new features are for special 
usage - for example: I do not use the bayes engine (only for testing the 
code). I have a very good spamdb, which is not changed for 2 years (doing 
some small scoring) - the meen way to detect spam is done by PB and 
IP-tests (mx,ptr,helo,FBMVT......). Because some mails bypassing (going 
other ways) assp, I use a bayes engine in Lotus Domino - just for fun - 
there are 1-5 messages blocked per week.  But I love to see the subject in 
Blockreports.

And keep in mind - bayesian checks should be only a small part of spam 
detection - because the simple (???) mathematics is not  (could never be) 
perfect.

Setting up the bomb regexes and all PB-valence values the right way, helps 
much more than the bayes check. It will take some time (and some work) to 
find out the best way (values) for you.
I think, Fritz and I (and possible some others) - we have found that point 
- our ASSP detects 99.99% (or even more) spams. I have not seen a blocked 
good email for over one year.

And at the end you have to weight - accept that from 200 users 10 are 
getting one spam per day (having 30.000 or more connections) - or 
analysing tonns of  mails and logs to get rid of the 10.


>So what are your thoughts, Thomas, of my remove same subjects,

same subject - same email  => ignored by assp, only one mail is processed 
in rebuildspamdb

but

same subject - different body ......  oh, the subject is only one part of 
the header (from to msg-id ip forwarder .............) - where is the 
end????
Just a joke! This will help to reduce the number of mails (files) for a 
while, but not to increase the quality of the corpus (->spamdb) - what is 
more important for you?
If you get tonns of spams with the same subject, do not use bayes to block 
them - use subjectRe or headerRe or blackRe or ........    .

The bayesian check is one of the latest checks of assp - so try to detect 
spams before.

Thank you for your help fixing the mistakes in config descriptions.


Thomas






K Post <nntp.p...@gmail.com> 
14.09.2009 16:53
Bitte antworten an
ASSP development mailing list <assp-test@lists.sourceforge.net>


An
ASSP development mailing list <assp-test@lists.sourceforge.net>
Kopie

Thema
Re: [Assp-test] Antwort: Re: fixes and news in 2.0.1_RC0.4.12






Right, BUT if we're limiting the total number of files in the directory,
wouldn't it be better to delete these duplicates to give a more diverse
corpus?

We can look at their subject name and then delete them.  This leave room 
for
other files.

Using subject logging and NOT move2numb, I guess I need some more
clarification at this point.

What method do you recommend using to keep the number of files down? 
Simply
deleting by age isn't going to cut it if you want a diverse corpus is it?

So what are your thoughts, Thomas, of my remove same subjects, remove 
really
old by date, then remove a percentage (selecting randomly) based on the
overage in each folder method?


 Note: in the MaxBayesFileAge, you've got:
Do not define this option, if you use the bayesian engine of ASSP. 
Deleting
files because of there age, is wrong in this case!!!!! It should be "their
age."  There's a bunch of other errors like this which I privately emailed
to Fritz, on request.  Should I send you that email too?


THANKS

On Sun, Sep 13, 2009 at 2:00 AM, Thomas Eckardt/eck <
thomas.ecka...@thockar.com> wrote:

> >What do you think about deleting redundant corpus emails
> >based on the subject?
>
> Redundant corpus emails are skipped/deleted based on there content (md5
> hash).
>
> Thomas
>
>
>
>
> K Post <nntp.p...@gmail.com>
> 13.09.2009 03:28
> Bitte antworten an
> ASSP development mailing list <assp-test@lists.sourceforge.net>
>
>
> An
> ASSP development mailing list <assp-test@lists.sourceforge.net>
> Kopie
>
> Thema
> Re: [Assp-test] fixes and news in 2.0.1_RC0.4.12
>
>
>
>
>
>
>  This is great, and thanks SO much for adding my idea of the max days 
for
> corrected spam.  What do you think about deleting redundant corpus 
emails
> based on the subject?
>
> On Sat, Sep 12, 2009 at 1:18 PM, Thomas Eckardt/eck <
> thomas.ecka...@thockar.com> wrote:
>
> > Hi all,
> >
> > I'm back.
> >
> > fixed in 4.12:
> >
> > - for some messages the mail header was transfered two times
> > - changing the display language was not working in any case
> > - the hintbox in the config part of the GUI has shown wrong
> > updated/changed values
> >
> > added in 4.12
> >
> > MaxCorrectedDays
> > msg008590=Max Corrected File Age
> > msg008591=This is the number of days a error report will be kept in 
the
> > correctednotspam and correctedspam folders. These folders are the
> longterm
> > memory of ASSP, therefore the default is 1000 days.
> >
> > changed in 4.12:
> >
> > - the change language part is moved to the main config form !
> >
> >
> >
> > Thomas
> >
> > DISCLAIMER:
> > *******************************************************
> > This email and any files transmitted with it may be confidential,
> legally
> > privileged and protected in law and are intended solely for the use of
> the
> >
> > individual to whom it is addressed.
> > This email was multiple times scanned for viruses. There should be no
> > known virus in this email!
> > *******************************************************
> >
> >
> >
>
> 
------------------------------------------------------------------------------
> > Let Crystal Reports handle the reporting - Free Crystal Reports 2008
> 30-Day
> > trial. Simplify your report design, integration and deployment - and
> focus
> > on
> > what you do best, core application coding. Discover what's new with
> > Crystal Reports now.  http://p.sf.net/sfu/bobj-july
> > _______________________________________________
> > Assp-test mailing list
> > Assp-test@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/assp-test
> >
>
> 
------------------------------------------------------------------------------
> Let Crystal Reports handle the reporting - Free Crystal Reports 2008
> 30-Day
> trial. Simplify your report design, integration and deployment - and 
focus
> on
> what you do best, core application coding. Discover what's new with
> Crystal Reports now.  http://p.sf.net/sfu/bobj-july
> _______________________________________________
> Assp-test mailing list
> Assp-test@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/assp-test
>
>
>
>
> DISCLAIMER:
> *******************************************************
> This email and any files transmitted with it may be confidential, 
legally
> privileged and protected in law and are intended solely for the use of 
the
>
> individual to whom it is addressed.
> This email was multiple times scanned for viruses. There should be no
> known virus in this email!
> *******************************************************
>
>
> 
------------------------------------------------------------------------------
> Let Crystal Reports handle the reporting - Free Crystal Reports 2008 
30-Day
> trial. Simplify your report design, integration and deployment - and 
focus
> on
> what you do best, core application coding. Discover what's new with
> Crystal Reports now.  http://p.sf.net/sfu/bobj-july
> _______________________________________________
> Assp-test mailing list
> Assp-test@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/assp-test
>
------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 
30-Day 
trial. Simplify your report design, integration and deployment - and focus 
on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Assp-test mailing list
Assp-test@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/assp-test




DISCLAIMER:
*******************************************************
This email and any files transmitted with it may be confidential, legally 
privileged and protected in law and are intended solely for the use of the 

individual to whom it is addressed.
This email was multiple times scanned for viruses. There should be no 
known virus in this email!
*******************************************************

------------------------------------------------------------------------------
Come build with us! The BlackBerry&reg; Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9&#45;12, 2009. Register now&#33;
http://p.sf.net/sfu/devconf
_______________________________________________
Assp-test mailing list
Assp-test@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/assp-test

Reply via email to