Jeff Thorne wrote:
I am trying to figure out whether or not Lucene is an appropriate solution
for a problem that our site faces.
<cut>
I would like to analyze each users post for various words and expressions
before publishing their post to the DB. I am reading through the Lucene in
action book and it looks as if I cannot analyze a string without first
indexing it. If this is true will indexing each post be a performance hit to
the site? I was wondering if someone could shed some light on the best way
to tackle this problem with Lucene or another api if doing so makes more
sense?

You can definitely use Lucene's analyser classes without indexing. Our own application does this when it needs to do things like highlighting text on the screen.

The idea would be you'd have a bunch of terms which are considered nasty, and then every new document would get analysed, and you would look through the terms returned from the analyser for the suspicious ones.

But no, it certainly isn't something that Lucene as a whole is very good at solving. Lucene is fast for executing a single query against multiple documents, but what you really need is something fast for executing multiple queries against a single document.

Daniel


--
Daniel Noll

Nuix Australia Pty Ltd
Suite 79, 89 Jones St, Ultimo NSW 2007, Australia
Phone: (02) 9280 0699
Fax:   (02) 9212 6902

This message is intended only for the named recipient. If you are not
the intended recipient you are notified that disclosing, copying,
distributing or taking any action in reliance on the contents of this
message or attachment is strictly prohibited.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to