On Thu, Jan 31, 2002 at 09:46:15AM +0700, Olivier Nicole wrote:
> Greg,
>
> > You don't run SpamAssassin's genetic algorithm -- I gather that only
> > Justin Mason, the prime developer, does that currently. He has a big
> > huge pile ("the corpus") of mail, spam and non-spam, that is used to
> > feed the GA and generate the scores in everyone's
> > /usr/share/spamassassin/*.cf files.
> >
> > Clever, eh? I'm sure it would be possible for everyone to have their
> > own corpus of mail, and if Justin released the GA code (or has he
> > already?) then we could all run the GA ourselves and come up with our
> > own score sets. But why bother?
>
One other problem is that the GA currently (IIRC) doesn't process the
messages, just the tests hit. Of course, now, the test are different from
those 2 versions ago, messing up the GA.
Furthermore, everyone has a different idea of what spam is. Is commercial
e-mail, that was sent by a company who legitimately has your e-mail address,
spam?
I imagine that the size of the corpus is not as important as the variety of
messages, its currentness, and the accuracy of its filing.
--
Duncan Findlay
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk