On Mon, 7 Apr 2014, Dave Warren wrote:
On 2014-04-06 17:21, John Hardin wrote:
On Sun, 6 Apr 2014, Dave Warren wrote:
> Is older ham useful? It specifically mentions that older spam isn't
> useful, and why, but I'm thinking older ham is probably useful since old
> mail clients and legitimately sent mail never dies. But I could filter
> based on date.
There's some debate about that. :)
I personally agree with you. Others disagree.
I've been giving it some thought and I think that perhaps limiting it to the
last few months will make it easier to get a sane set of TRUSTED_NETWORKS and
INTERNAL_NETWORKS; I've got mail going back to
~ 2002 but no real recollection of how things were set up or named prior
to 2007 or so.
Initially I'll limit it to mail within the last couple of months, but perhaps
expand that up to 24-36 months for non-spam and 6 months for spam, is that
sane/reasonable?
Sure.
Yes, ham-only masscheck submissions would be very welcome.
Perfect, glad to hear it. At this point I've built a dedicated box to run the
masscheck scripts, so now it's just a matter of putting together a corpus and
doing some sanity checking and testing.
My current thought is to take user-fed spam and non-spam folders and place
copies of messages into a staging path which will then be reviewed before
being added to the corpus for learning. Hopefully I'll be ready to go live
within a day or two.
Thanks for your participation!
--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhar...@impsec.org FALaholic #11174 pgpk -a jhar...@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
...every time I sit down in front of a Windows machine I feel as
if the computer is just a place for the manufacturers to put their
advertising. -- fwadling on Y! SCOX
-----------------------------------------------------------------------
6 days until Thomas Jefferson's 271st Birthday