Jason Polites wrote:
There is also an open source java anti spam api which does a baysian
scan of
email content (plus other stuff).
You could retro-fit to work with raw text.
There is also Classifier4J, which is more geared toward pure
classification (comes with a Bayesian classifier but oth
: "Gwyn Carwardine" <[EMAIL PROTECTED]>
To:
Sent: Tuesday, February 07, 2006 12:58 AM
Subject: RE: Inappropriate content detection
The good bit about Bayesian is that it continuously learns.
The downside is that you have to teach it.
Not quite as simple as a list of rude wo
-Gwyn
-Original Message-
From: Jeff Thorne [mailto:[EMAIL PROTECTED]
Sent: 06 February 2006 13:30
To: java-user@lucene.apache.org
Subject: RE: Inappropriate content detection
The site will have million+ posts. I am not familiar with Bayesian
algorithms. Is there an off the shelf API tha
:[EMAIL PROTECTED]
Sent: Sunday, February 05, 2006 8:40 PM
To: java-user@lucene.apache.org
Subject: Re: Inappropriate content detection
Hi, what scale is this website? millions of posts or under?
wouldn't it be easiler to use a bayesian algorithm to scan each new post
before it is posted to d
Hi, what scale is this website? millions of posts or under?
wouldn't it be easiler to use a bayesian algorithm to scan each new post
before it is posted to detect whether it is acceptable or not? just a quick
idea of my head
_gk
- Original Message -
From: "Jeff Thorne" <[EMAIL PRO
You can generate a token stream for a block of text without having to index
it. Take a look at the highlighter code, it does this very thing.
On 2/5/06, Jeff Thorne <[EMAIL PROTECTED]> wrote:
>
> I am trying to figure out whether or not Lucene is an appropriate solution
> for a problem that our
Jeff Thorne wrote:
I am trying to figure out whether or not Lucene is an appropriate solution
for a problem that our site faces.
I would like to analyze each users post for various words and expressions
before publishing their post to the DB. I am reading through the Lucene in
action book and