I'm trying to run the GA against my corpus of mail to see how much improvement I can get from the new rule weightings. A couple questions:
1) The documentation says a large corpus is necessary. How large are we talking? 1000? 10,000? 1,000,000 messages?
2) The documentation in the masses/ directory seems misleading. README refers to galib245, but the Makefile refers to PGALIB. I found the following in a posting from a couple years ago:
"The code is available. There are actually two different GAs in the /masses directory -- one, evolve.cxx is Justin's, based on a library called galib; the other, craig-evolve.cxx is mine, based on pgapack so it can make use of multiple CPUs (and even multiple nodes if your computational desires swing that way)."
I only see craig-evolve.c, not evolve.cxx, but the README refers to the GAlib apprently required by evolve.cxx. Is the README just out of date, or am I missing something?
- Philip
----------
Philip Tucker
Zix Research Center
Anti-Spam Team Lead
214.370.2068