On Fri, 4 Dec 2009, Greg Troxel wrote:
A problem with the spam%/ham% checking methodology is that it makes the accreditation look reasonable for corpuses that have lots of requested commercial mail. That's certainly fine for those people, but the outcomes seem very different for those that don't ask for such mail - they're left with only the spam.
Agreed. Though reasonably speaking, the overall volume of 'accredited' spam should be the same as an overall percentage. So it should still raise a 'red flag' when it gets too large, regardless of how much ham benefits from the rule.
- C