Jason Polites wrote:
There is also an open source java anti spam api which does a baysian
scan of
email content (plus other stuff).
You could retro-fit to work with raw text.
There is also Classifier4J, which is more geared toward pure
classification (comes with a Bayesian classifier but oth
: "Gwyn Carwardine" <[EMAIL PROTECTED]>
To:
Sent: Tuesday, February 07, 2006 12:58 AM
Subject: RE: Inappropriate content detection
The good bit about Bayesian is that it continuously learns.
The downside is that you have to teach it.
Not quite as simple as a list of rude wo
-Gwyn
-Original Message-
From: Jeff Thorne [mailto:[EMAIL PROTECTED]
Sent: 06 February 2006 13:30
To: java-user@lucene.apache.org
Subject: RE: Inappropriate content detection
The site will have million+ posts. I am not familiar with Bayesian
algorithms. Is there an off the shelf API tha
:[EMAIL PROTECTED]
Sent: Sunday, February 05, 2006 8:40 PM
To: java-user@lucene.apache.org
Subject: Re: Inappropriate content detection
Hi, what scale is this website? millions of posts or under?
wouldn't it be easiler to use a bayesian algorithm to scan each new post
before it is posted to d
e" <[EMAIL PROTECTED]>
To:
Sent: Monday, February 06, 2006 3:56 AM
Subject: Inappropriate content detection
I am trying to figure out whether or not Lucene is an appropriate solution
for a problem that our site faces. Our site
allows users to post their opinions on various topics. Due
You can generate a token stream for a block of text without having to index
it. Take a look at the highlighter code, it does this very thing.
On 2/5/06, Jeff Thorne <[EMAIL PROTECTED]> wrote:
>
> I am trying to figure out whether or not Lucene is an appropriate solution
> for a problem that our
Jeff Thorne wrote:
I am trying to figure out whether or not Lucene is an appropriate solution
for a problem that our site faces.
I would like to analyze each users post for various words and expressions
before publishing their post to the DB. I am reading through the Lucene in
action book and
I am trying to figure out whether or not Lucene is an appropriate solution
for a problem that our site faces. Our site
allows users to post their opinions on various topics. Due to various
government legislations around the world our management would like us to
scan each users post against various