Hi, The spam corpus I used is actually a set of messages caught by SA; I don't know how many unique messages there are (maybe 80%?), but there's more than 12,000 total. So it won't help SA much, but I'm guessing that it helps to seed bayes.
The tgz file is 27M in size. I can make it available via ftp if you'd like. Ricardo ----- Original Message Follows ----- > I haven't seen one :) > > I think you have a pretty solid start... :) > > care to share your SPAM corpus? > > CT > > ----- Original Message ----- > From: "Ricardo Kleemann" <[EMAIL PROTECTED]> > To: <[EMAIL PROTECTED]> > Sent: Friday, August 15, 2003 10:57 AM > Subject: [SAtalk] archives for seeding bayes? > > > > Hi, > > > > I've trained my bayes database with about 12,000 spam > > and 7,000 ham messages, but I was wondering if there are > > much larger archives available for seeding bayes? > > > > Thanks > > Ricardo > > > > > > ------------------------------------------------------- > > This SF.Net email sponsored by: Free pre-built ASP.NET > > sites including Data Reports, E-commerce, Portals, and > > Forums are available now. Download today and enter to > win an XBOX or Visual Studio .NET. > > http://aspnet.click-url.com/go/psa00100003ave/direct > > ;at.aspnet_072303_01/01 > > _______________________________________________ > > Spamassassin-talk mailing list > > [EMAIL PROTECTED] > https://lists.sourceforge.net/lists/listinfo/spamassassin-talk > ------------------------------------------------------- This SF.Net email sponsored by: Free pre-built ASP.NET sites including Data Reports, E-commerce, Portals, and Forums are available now. Download today and enter to win an XBOX or Visual Studio .NET. http://aspnet.click-url.com/go/psa00100003ave/direct;at.aspnet_072303_01/01 _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk