On 2011/03/10 2:17 PM, Adam Katz wrote:
On 03/10/2011 07:59 AM, Adam Moffett wrote:
I'd be happy to contribute, but we bounce or outright delete high
scoring spam.
After Reading these wiki articles:
http://wiki.apache.org/spamassassin/HandClassifiedCorpora
http://wiki.apache.org/spamassassin/CorpusCleaning
I get the impression that they want a representative sample of your
spam, and i will skew things in a bad way if I only submit the spam
that spamassassin already scored low.
What is your bounce/delete threshold? If it's high enough, I would say
that the skew it presents to the scores would actually stand to help
more than hurt (as long as we still have plenty of other non-trap
sources that contribute un-capped spam).
I figure spam capped at 15+ points would be fine, but you'll need
developer consensus on that.
Wouldn't spam already scored at 15+ be considered a little redundant to
the corpus? If not, I'm certain I could modify my config to keep a copy
for processing in the mass checks.
--
/Jason