You can generate a token stream for a block of text without having to index it. Take a look at the highlighter code, it does this very thing.
On 2/5/06, Jeff Thorne <[EMAIL PROTECTED]> wrote: > > I am trying to figure out whether or not Lucene is an appropriate solution > for a problem that our site faces. Our site > > allows users to post their opinions on various topics. Due to various > government legislations around the world our management would like us to > scan each users post against various keywords that would indicate > inappropriate content > > in the users posting. We are looking for racial slurs, profanity and > attacks > against sexual orientation. Each users posting is > > generally not more that a few paragraphs. > > > > I would like to analyze each users post for various words and expressions > before publishing their post to the DB. I am reading through the Lucene in > action book and it looks as if I cannot analyze a string without first > indexing it. If this is true will indexing each post be a performance hit > to > the site? I was wondering if someone could shed some light on the best way > to tackle this problem with Lucene or another api if doing so makes more > sense? > > > > Thanks, > > Jeff > > > > >