On Sun, Oct 23, 2011 at 06:35:02PM -0400, Marios Titas wrote:
> Hi all,
> 
> I was recently given a list of 10,000 posts from an internet forum.
> Out of those, 9,000 had been aproved by the site's moderators and the
> remaining were rejected. I was wondering if I could use this data set
> to play with Bayesian filtering in spamassassin.

Why don't you just try something like dspam and it's "DataSource document"
option.  It should process non-email data just like that and probably work
much more efficiently anyway.  SA Bayes heavily tuned for email messages and
their quirks.

Of course if would be interesting if someone put up a comparison.

Reply via email to