Re: SPAM/Phish and Ham E-mail Dataset

2011-01-14 Thread Mahmoud Khonji
On Thu, Jan 13, 2011 at 2:23 AM, mouss wrote: > sigh. if you can't understand what "privacy" means, then you are part of > the problem. Ham corpus "may" conflict with privacy, but it does not necessarily have to. An example is the old ~2005 ham corpus. People can decide which emails to share, and

Re: SPAM/Phish and Ham E-mail Dataset

2011-01-13 Thread David F. Skoll
On Thu, 13 Jan 2011 13:51:14 + RW wrote: > Is there anything to prevent spammers signing up and using your > databases to autogenerate spam? Not really, but then we only make our database available to customers using our commercial product, so the cost would probably deter spammers. > It so

Re: SPAM/Phish and Ham E-mail Dataset

2011-01-13 Thread RW
On Wed, 12 Jan 2011 21:25:06 -0500 "David F. Skoll" wrote: > On Wed, 12 Jan 2011 23:23:39 +0100 > mouss wrote: > > [...] > > > you need to train with _your_mail. do not train with somebody else's > > mail. one of the defence args is that attackers can't guess your > > setup. if every one of us

Re: SPAM/Phish and Ham E-mail Dataset

2011-01-12 Thread David F. Skoll
On Wed, 12 Jan 2011 23:23:39 +0100 mouss wrote: [...] > you need to train with _your_mail. do not train with somebody else's > mail. one of the defence args is that attackers can't guess your > setup. if every one of us uses the same corpus then it'll be easy for > an attacker to get around. Th

Re: SPAM/Phish and Ham E-mail Dataset

2011-01-12 Thread Marco Ribeiro
http://untroubled.org/spam/ --> for spam, updated daily. --Marco Túlio *Sola Scriptura, Sola Fide, Sola Gratia, Solus Christus, Soli Deo Glória* On Wed, Jan 12, 2011 at 8:23 PM, mouss wrote: > Le 12/01/2011 23:02, Mahmoud Khonji a écrit : > > I would highly appreciate if anyone is able to sen

Re: SPAM/Phish and Ham E-mail Dataset

2011-01-12 Thread mouss
Le 12/01/2011 23:02, Mahmoud Khonji a écrit : > I would highly appreciate if anyone is able to send me his SPAM/Ham email > collection. sigh. if you can't understand what "privacy" means, then you are part of the problem. > > I need it to train and test classifiers. you need to train with _yo

SPAM/Phish and Ham E-mail Dataset

2011-01-12 Thread Mahmoud Khonji
I would highly appreciate if anyone is able to send me his SPAM/Ham email collection. I need it to train and test classifiers. The issue with available corpus is that they are outdated. They generally date back in 2005, and lot has changed since then -- We've got SPAMers with spell checkers at le