On 1/20/2011 7:23 AM, R - elists wrote:
initially this came across as a really suspect idea...
i.e., one man's junk is another man's treasure
Ham is a lot easier to define than Spam. Ham is simply anything that
you subscribed for.
for a moment, it appeared we were gonna need to review the good and the bad
of spam-l to avoid serious SA list issues.
statistically speaking, this shouldnt sway the scoring substantially anyways
would it?
You are correct. This is more of a tool to have *some* variety in the
ham corpus, to make it possible to flag rules in need of scrutiny. For
example, prior to 3.3.x many of our rules were utterly broken with
Japanese mail. We had no idea of this fact until I added a few thousand
Japanese mail to the ham corpus. JM understood the problem and fixed
those rules.
what should be known so that bad data is not allowed into the HAM corpus ?
The previous discussion described a sort of "tagged sender" ham trap.
This simple process automatically excludes extraneous mail in cases
where the address was shared with "affiliates" or spammer lists. We
also will be careful in sticking to reputable companies and orgs for the
ham trap.
Warren