Re: Inappropriate content detection

Jeff Rodenburg Sun, 05 Feb 2006 20:51:34 -0800

You can generate a token stream for a block of text without having to index
it. Take a look at the highlighter code, it does this very thing.




On 2/5/06, Jeff Thorne <[EMAIL PROTECTED]> wrote:
>
> I am trying to figure out whether or not Lucene is an appropriate solution
> for a problem that our site faces. Our site
>
> allows users to post their opinions on various topics. Due to various
> government legislations around the world our management would like us to
> scan each users post against various keywords that would indicate
> inappropriate content
>
> in the users posting. We are looking for racial slurs, profanity and
> attacks
> against sexual orientation. Each users posting is
>
> generally not more that a few paragraphs.
>
>
>
> I would like to analyze each users post for various words and expressions
> before publishing their post to the DB. I am reading through the Lucene in
> action book and it looks as if I cannot analyze a string without first
> indexing it. If this is true will indexing each post be a performance hit
> to
> the site? I was wondering if someone could shed some light on the best way
> to tackle this problem with Lucene or another api if doing so makes more
> sense?
>
>
>
> Thanks,
>
> Jeff
>
>
>
>
>

Re: Inappropriate content detection

Reply via email to