Jeff Thorne wrote:
I am trying to figure out whether or not Lucene is an appropriate solution for a problem that our site faces.
<cut>
I would like to analyze each users post for various words and expressions before publishing their post to the DB. I am reading through the Lucene in action book and it looks as if I cannot analyze a string without first indexing it. If this is true will indexing each post be a performance hit to the site? I was wondering if someone could shed some light on the best way to tackle this problem with Lucene or another api if doing so makes more sense?
You can definitely use Lucene's analyser classes without indexing. Our own application does this when it needs to do things like highlighting text on the screen.
The idea would be you'd have a bunch of terms which are considered nasty, and then every new document would get analysed, and you would look through the terms returned from the analyser for the suspicious ones.
But no, it certainly isn't something that Lucene as a whole is very good at solving. Lucene is fast for executing a single query against multiple documents, but what you really need is something fast for executing multiple queries against a single document.
Daniel -- Daniel Noll Nuix Australia Pty Ltd Suite 79, 89 Jones St, Ultimo NSW 2007, Australia Phone: (02) 9280 0699 Fax: (02) 9212 6902 This message is intended only for the named recipient. If you are not the intended recipient you are notified that disclosing, copying, distributing or taking any action in reliance on the contents of this message or attachment is strictly prohibited. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]