Re: Inappropriate content detection

gekkokid Sun, 05 Feb 2006 22:18:53 -0800

Hi, what scale is this website? millions of posts or under?

wouldn't it be easiler to use a bayesian algorithm to scan each new postbefore it is posted to detect whether it is acceptable or not? just a quickidea of my head

_gk

----- Original Message -----From: "Jeff Thorne" <[EMAIL PROTECTED]>

To: <java-user@lucene.apache.org>
Sent: Monday, February 06, 2006 3:56 AM
Subject: Inappropriate content detection

I am trying to figure out whether or not Lucene is an appropriate solution
for a problem that our site faces. Our site

allows users to post their opinions on various topics. Due to various
government legislations around the world our management would like us to
scan each users post against various keywords that would indicate
inappropriate content

in the users posting. We are looking for racial slurs, profanity andattacks

against sexual orientation. Each users posting is

generally not more that a few paragraphs.



I would like to analyze each users post for various words and expressions
before publishing their post to the DB. I am reading through the Lucene in
action book and it looks as if I cannot analyze a string without first

indexing it. If this is true will indexing each post be a performance hitto

the site? I was wondering if someone could shed some light on the best way
to tackle this problem with Lucene or another api if doing so makes more
sense?



Thanks,

Jeff



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Inappropriate content detection

Reply via email to