Re: Inappropriate content detection

2006-02-06 Thread Daniel Noll
Jason Polites wrote: There is also an open source java anti spam api which does a baysian scan of email content (plus other stuff). You could retro-fit to work with raw text. There is also Classifier4J, which is more geared toward pure classification (comes with a Bayesian classifier but oth

Re: Inappropriate content detection

2006-02-06 Thread Jason Polites
: "Gwyn Carwardine" <[EMAIL PROTECTED]> To: Sent: Tuesday, February 07, 2006 12:58 AM Subject: RE: Inappropriate content detection The good bit about Bayesian is that it continuously learns. The downside is that you have to teach it. Not quite as simple as a list of rude wo

RE: Inappropriate content detection

2006-02-06 Thread Gwyn Carwardine
-Gwyn -Original Message- From: Jeff Thorne [mailto:[EMAIL PROTECTED] Sent: 06 February 2006 13:30 To: java-user@lucene.apache.org Subject: RE: Inappropriate content detection The site will have million+ posts. I am not familiar with Bayesian algorithms. Is there an off the shelf API tha

RE: Inappropriate content detection

2006-02-06 Thread Jeff Thorne
:[EMAIL PROTECTED] Sent: Sunday, February 05, 2006 8:40 PM To: java-user@lucene.apache.org Subject: Re: Inappropriate content detection Hi, what scale is this website? millions of posts or under? wouldn't it be easiler to use a bayesian algorithm to scan each new post before it is posted to d

Re: Inappropriate content detection

2006-02-05 Thread gekkokid
Hi, what scale is this website? millions of posts or under? wouldn't it be easiler to use a bayesian algorithm to scan each new post before it is posted to detect whether it is acceptable or not? just a quick idea of my head _gk - Original Message - From: "Jeff Thorne" <[EMAIL PRO

Re: Inappropriate content detection

2006-02-05 Thread Jeff Rodenburg
You can generate a token stream for a block of text without having to index it. Take a look at the highlighter code, it does this very thing. On 2/5/06, Jeff Thorne <[EMAIL PROTECTED]> wrote: > > I am trying to figure out whether or not Lucene is an appropriate solution > for a problem that our

Re: Inappropriate content detection

2006-02-05 Thread Daniel Noll
Jeff Thorne wrote: I am trying to figure out whether or not Lucene is an appropriate solution for a problem that our site faces. I would like to analyze each users post for various words and expressions before publishing their post to the DB. I am reading through the Lucene in action book and