Hi AJ -
Performance would depend on the kind of queries you are going to perform
against sentences. If you are going to be querying for phrases
(multi-token), want to make use of stemming, or any kind of term
expansion (wildcare, synonyms, etc), I imagine lucene would be much
superior, but I
Hi Marc,
Thanks for your suggestions. Marking sentences in documents and using span
query is a good approach. How do you compare its performance to a database
approach? For example, sentences can be stored in mysql, one sentence per
row, and they can be searched by mysql's full text search feature
Hi AJ -
Depending on your need, you could create a lucene document for each
sentence (in which case searching and returning sentences is trivial),
or create a lucene document for each of your documents, with embedded
sentence start/stop markers (as a special symbol). or, instead of a
special
I'll appreciate any advice on whether Lucene is appropriate for index/search
sentences. I have millions of documents broken down into millions of
sentences. Each sentence does not exist as a document. All these sentences
are in a small number of big files. How can I use Lucene to index/search the