Re: Inappropriate content detection

2006-02-06 Thread Daniel Noll
Jason Polites wrote: There is also an open source java anti spam api which does a baysian scan of email content (plus other stuff). You could retro-fit to work with raw text. There is also Classifier4J, which is more geared toward pure classification (comes with a Bayesian classifier but oth

Re: Inappropriate content detection

2006-02-06 Thread Jason Polites
: "Gwyn Carwardine" <[EMAIL PROTECTED]> To: Sent: Tuesday, February 07, 2006 12:58 AM Subject: RE: Inappropriate content detection The good bit about Bayesian is that it continuously learns. The downside is that you have to teach it. Not quite as simple as a list of rude wo

RE: Inappropriate content detection

2006-02-06 Thread Gwyn Carwardine
-Gwyn -Original Message- From: Jeff Thorne [mailto:[EMAIL PROTECTED] Sent: 06 February 2006 13:30 To: java-user@lucene.apache.org Subject: RE: Inappropriate content detection The site will have million+ posts. I am not familiar with Bayesian algorithms. Is there an off the shelf API tha

RE: Inappropriate content detection

2006-02-06 Thread Jeff Thorne
:[EMAIL PROTECTED] Sent: Sunday, February 05, 2006 8:40 PM To: java-user@lucene.apache.org Subject: Re: Inappropriate content detection Hi, what scale is this website? millions of posts or under? wouldn't it be easiler to use a bayesian algorithm to scan each new post before it is posted to d

Re: Inappropriate content detection

2006-02-05 Thread gekkokid
e" <[EMAIL PROTECTED]> To: Sent: Monday, February 06, 2006 3:56 AM Subject: Inappropriate content detection I am trying to figure out whether or not Lucene is an appropriate solution for a problem that our site faces. Our site allows users to post their opinions on various topics. Due

Re: Inappropriate content detection

2006-02-05 Thread Jeff Rodenburg
You can generate a token stream for a block of text without having to index it. Take a look at the highlighter code, it does this very thing. On 2/5/06, Jeff Thorne <[EMAIL PROTECTED]> wrote: > > I am trying to figure out whether or not Lucene is an appropriate solution > for a problem that our

Re: Inappropriate content detection

2006-02-05 Thread Daniel Noll
Jeff Thorne wrote: I am trying to figure out whether or not Lucene is an appropriate solution for a problem that our site faces. I would like to analyze each users post for various words and expressions before publishing their post to the DB. I am reading through the Lucene in action book and

Inappropriate content detection

2006-02-05 Thread Jeff Thorne
I am trying to figure out whether or not Lucene is an appropriate solution for a problem that our site faces. Our site allows users to post their opinions on various topics. Due to various government legislations around the world our management would like us to scan each users post against various