On Thu, Jan 31, 2002 at 09:46:15AM +0700, Olivier Nicole wrote: > Greg, > > > You don't run SpamAssassin's genetic algorithm -- I gather that only > > Justin Mason, the prime developer, does that currently. He has a big > > huge pile ("the corpus") of mail, spam and non-spam, that is used to > > feed the GA and generate the scores in everyone's > > /usr/share/spamassassin/*.cf files. > > > > Clever, eh? I'm sure it would be possible for everyone to have their > > own corpus of mail, and if Justin released the GA code (or has he > > already?) then we could all run the GA ourselves and come up with our > > own score sets. But why bother? >
One other problem is that the GA currently (IIRC) doesn't process the messages, just the tests hit. Of course, now, the test are different from those 2 versions ago, messing up the GA. Furthermore, everyone has a different idea of what spam is. Is commercial e-mail, that was sent by a company who legitimately has your e-mail address, spam? I imagine that the size of the corpus is not as important as the variety of messages, its currentness, and the accuracy of its filing. -- Duncan Findlay _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk