Caching Filters and docIds when using MultiSearcher/IndexSearcher(MultiReader)...

Antony Bowesman Thu, 11 Sep 2008 19:58:34 -0700

Up to now I have only needed to search a single index, but now I will have manyindex shards to search across. My existing search mantained cached filters forthe index as well as a cache of my own unique ID fields in the index, keyed byLucene DocId.

Now I need to search multiple indices, I am trying to work out how to continueto use these caches.

I have one index per month of data (up to 10M docs per month) and users cansearch across whichever date range they want, so one search may search Index1-->12 (e.g. Jan07-Dec07) and another 13-20 (Jan08-Aug08).

It makes no sense to cache a single bitset generated from a MultiReader overindices 1-12 when the next search could be for indices 2-11 and all the bitswould be useless, so to be of any use, caches, including cached BitSets shouldtherefore contain the doc ids specific to the particular index rather than toany particular MultiReader. Then my Filter implementation can determine thereal doc id and delegate to a bitset for the particular reader instance.

This means I need to find the original reader/searcher instance and theparticular doc Id from that instance to perform bitset checks or cache lookups.

In the MultiSearcher there is subDoc and subSearcher, but there's no such beastfor an IndexReader to find the real reader/doc from the pseudo one.

This also raises the question about MultiSearcher vs IndexSearcher(MultiReader)which, even after reading the the archives, I am unsure which I should use -there seem to be comments in the dev list to avoid MultiSearcher...


Any thoughts or have I spiralled too far into Lucene's depths to see where I 
am...?

Antony






---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Caching Filters and docIds when using MultiSearcher/IndexSearcher(MultiReader)...

Reply via email to