Re: to filter or not to filter

Erik Hatcher Wed, 17 Aug 2005 18:10:58 -0700

On Aug 17, 2005, at 3:29 PM, Dan Funk wrote:

Currently I'm working with a single index where content is indexedby it's original printed page. I have to show the total number ofmatching documents, so I end up running through all the hits andtaking an order of magnitude hit on performance as I calculate thenumber of unique documents. It's stupid for many many reasons.
To correct all this, I've decided to create two (maybe three)indexes for the same set of documents: in the first index there isa one to one relationship between the original document and theLucene Document object. The other index is a paragraph index,where each lucene document represents a single paragraph. I mayeven throw in a third index where each lucene document represents alogical section/chapter.
When I'm building the search results page I'll have to execute afair number of queries. The first query will execute on theDocument-Index, then for each of the 10 to 2o results I'mdisplaying at the time, I'll execute another query to find the bestparagraph and or section.
Is this a reasonable solution to the problem?
Thanks for the advice.

Just one design alternative - a Lucene index does not have to behomogenous in terms of the fields for a document. So you could indexall those various granularities into a single index with anadditional field per document indicating whether it is a document,paragraph, or section/chapter.


    Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: to filter or not to filter

Reply via email to