On 2014-04-05 09:14, John Hardin wrote:
On Sat, 5 Apr 2014, Amir Reza Rahbaran wrote:
I want to know how long it takes custom signatures updated by sa-update.
Daily, if the corpora are sufficient for masscheck scoring to run.
At the moment the masscheck corpus is ham-starved. There's not quite
enough ham available for reliable scores to be generated and published.
Once again, participation as a mass-checker, especially if you can
provide a non-English ham corpus, is solicited. If you have access to
thousands of reliably-categorized messages and can set up a box to run
SpamAssassin to scan them to test the performance of the base rules,
please consider becoming a masscheck contributor. The content of
private messages is not exposed by this process, only the rule hits
are public.
If you can do this, see the wiki for the process and contact Kevin
McGrail for upload credentials. Thanks!
I've been idly debating figuring out how to contribute, but having read
the wiki articles, I have a few questions:
Is older ham useful? It specifically mentions that older spam isn't
useful, and why, but I'm thinking older ham is probably useful since old
mail clients and legitimately sent mail never dies. But I could filter
based on date.
Is mail "Sent" folder mail of any use? I suspect not, since there's not
necessarily a Received header yet (although there might be, it depends
on how the user sent the message), so direct-to-MX and similar rules
will skew.
Is a ham-only corpus submission useful? Our ham is well cleaned, but we
don't archive spam on an ongoing basis, and users primarily just delete
spam. But most of our users archive ham and retain it, so depending on
what the results look like, it might be useful data source.
--
Dave Warren
http://www.hireahit.com/
http://ca.linkedin.com/in/davejwarren