(delurking in a net cafe somewhere in Oz ;)

>> * Default score = 0
> I think that's probably a good idea for the test as it stands because
> it's a fairly uncontrolled score applied equally to a /large/
> proportion of the world.

I agree.  If the test is added it should be 0 by default.

>> * Separate tests for each tld (so that score for each tld can be
>> controlled)
> This would be a really good idea ... combined with the GA scoring
>system. Of course, you probably want to add tests for every TLD, not
>just the set there.
> you can let the GA work over the corpus and calculate what is and
> isn't a source of SPAM at this point in time.

Mind you, I don't think this is a good idea; it will make SA even more 
westerner-oriented. :(  Pretty much all the GA corpus is from western
sources and in western charsets, so the GA will totally skew it.

BTW I did some statistical analysis of TLD-spam correlation, using the
number of hosts or domains in that TLD to weight against it, for use in the 
round-the-world test. That's in a comment at the top of EvalTests.pm, might 
be worth a look.

--j.


_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to