(delurking in a net cafe somewhere in Oz ;) >> * Default score = 0 > I think that's probably a good idea for the test as it stands because > it's a fairly uncontrolled score applied equally to a /large/ > proportion of the world.
I agree. If the test is added it should be 0 by default. >> * Separate tests for each tld (so that score for each tld can be >> controlled) > This would be a really good idea ... combined with the GA scoring >system. Of course, you probably want to add tests for every TLD, not >just the set there. > you can let the GA work over the corpus and calculate what is and > isn't a source of SPAM at this point in time. Mind you, I don't think this is a good idea; it will make SA even more westerner-oriented. :( Pretty much all the GA corpus is from western sources and in western charsets, so the GA will totally skew it. BTW I did some statistical analysis of TLD-spam correlation, using the number of hosts or domains in that TLD to weight against it, for use in the round-the-world test. That's in a comment at the top of EvalTests.pm, might be worth a look. --j. _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk