Re: use Lucene to index sentences

2006-02-06 Thread Marc Hadfield
Hi AJ - Performance would depend on the kind of queries you are going to perform against sentences. If you are going to be querying for phrases (multi-token), want to make use of stemming, or any kind of term expansion (wildcare, synonyms, etc), I imagine lucene would be much superior, but I

Re: use Lucene to index sentences

2006-02-06 Thread AJ Chen
Hi Marc, Thanks for your suggestions. Marking sentences in documents and using span query is a good approach. How do you compare its performance to a database approach? For example, sentences can be stored in mysql, one sentence per row, and they can be searched by mysql's full text search feature

Re: use Lucene to index sentences

2006-02-06 Thread Marc Hadfield
Hi AJ - Depending on your need, you could create a lucene document for each sentence (in which case searching and returning sentences is trivial), or create a lucene document for each of your documents, with embedded sentence start/stop markers (as a special symbol). or, instead of a special

use Lucene to index sentences

2006-02-06 Thread AJ Chen
I'll appreciate any advice on whether Lucene is appropriate for index/search sentences. I have millions of documents broken down into millions of sentences. Each sentence does not exist as a document. All these sentences are in a small number of big files. How can I use Lucene to index/search the