On 1/20/2011 7:23 AM, R - elists wrote:

initially this came across as a really suspect idea...

i.e., one man's junk is another man's treasure

Ham is a lot easier to define than Spam. Ham is simply anything that you subscribed for.


for a moment, it appeared we were gonna need to review the good and the bad
of spam-l to avoid serious SA list issues.

statistically speaking, this shouldnt sway the scoring substantially anyways
would it?

You are correct. This is more of a tool to have *some* variety in the ham corpus, to make it possible to flag rules in need of scrutiny. For example, prior to 3.3.x many of our rules were utterly broken with Japanese mail. We had no idea of this fact until I added a few thousand Japanese mail to the ham corpus. JM understood the problem and fixed those rules.


what should be known so that bad data is not allowed into the HAM corpus ?


The previous discussion described a sort of "tagged sender" ham trap. This simple process automatically excludes extraneous mail in cases where the address was shared with "affiliates" or spammer lists. We also will be careful in sticking to reputable companies and orgs for the ham trap.

Warren

Reply via email to