Up to now I have only needed to search a single index, but now I will have many index shards to search across. My existing search mantained cached filters for the index as well as a cache of my own unique ID fields in the index, keyed by Lucene DocId.

Now I need to search multiple indices, I am trying to work out how to continue to use these caches.

I have one index per month of data (up to 10M docs per month) and users can search across whichever date range they want, so one search may search Index 1-->12 (e.g. Jan07-Dec07) and another 13-20 (Jan08-Aug08).

It makes no sense to cache a single bitset generated from a MultiReader over indices 1-12 when the next search could be for indices 2-11 and all the bits would be useless, so to be of any use, caches, including cached BitSets should therefore contain the doc ids specific to the particular index rather than to any particular MultiReader. Then my Filter implementation can determine the real doc id and delegate to a bitset for the particular reader instance.

This means I need to find the original reader/searcher instance and the particular doc Id from that instance to perform bitset checks or cache lookups.

In the MultiSearcher there is subDoc and subSearcher, but there's no such beast for an IndexReader to find the real reader/doc from the pseudo one.

This also raises the question about MultiSearcher vs IndexSearcher(MultiReader) which, even after reading the the archives, I am unsure which I should use - there seem to be comments in the dev list to avoid MultiSearcher...

Any thoughts or have I spiralled too far into Lucene's depths to see where I 
am...?

Antony






---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to