On Thu, Jan 31, 2002 at 09:46:15AM +0700, Olivier Nicole wrote:
> Greg,
> 
> > You don't run SpamAssassin's genetic algorithm -- I gather that only
> > Justin Mason, the prime developer, does that currently.  He has a big
> > huge pile ("the corpus") of mail, spam and non-spam, that is used to
> > feed the GA and generate the scores in everyone's
> > /usr/share/spamassassin/*.cf files.
> > 
> > Clever, eh?  I'm sure it would be possible for everyone to have their
> > own corpus of mail, and if Justin released the GA code (or has he
> > already?)  then we could all run the GA ourselves and come up with our
> > own score sets.  But why bother?
> 

One other problem is that the GA currently (IIRC) doesn't process the
messages, just the tests hit.  Of course, now, the test are different from
those 2 versions ago, messing up the GA.

Furthermore, everyone has a different idea of what spam is.  Is commercial
e-mail, that was sent by a company who legitimately has your e-mail address,
spam?

I imagine that the size of the corpus is not as important as the variety of
messages, its currentness, and the accuracy of its filing.
-- 
Duncan Findlay

_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to