Carlos, It sounds like you'll have to build logic that knows when the index has been reopened and repopulates your cache. Take a look at Solr, it does this type of stuff.
Otis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Simpy -- http://www.simpy.com/ - Tag - Search - Share ----- Original Message ---- From: Carlos Pita <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Thursday, May 24, 2007 12:50:04 PM Subject: Re: HitCollector or Hits Hi Erick, I don't think that FieldSelector would be that valuable in my case because I just need to access a few fields, and those are all fields that are in fact stored (and indexed too). I was thinking of keeping this extra information in memory, precisely into an array mapping doc ids to the data structure. I see that this is done for ScoreDocComparator in a Lucene in Action example. I'm still not sure how to achieve something similar with a HitCollector. I mean, I could instantiate a maxDoc() size array and index it by the document ids that are passed to the collector. But that said, I don't know how to keep this array synchronized with the index. I've opened a new thread for this subject, "maxDoc and arrays". Thank you again. Cheers, Carlos On 5/24/07, Erick Erickson <[EMAIL PROTECTED]> wrote: > > You're on the right track. But that said, access to anything that's > indexed (stored or not) should be pretty quick. Things > stored, but not indexed, are costlier. This might drive your > decision on what to index .vs. store..... > > Loading the document is anything like IndexReader.document(), or > Hits.doc(). > > Part of the difference is that if you load the document, you get > all the fields, whether you need them or not. > > Also, you can use your own TermEnum/TermDocs lookup for > this kind of thing if the terms you're interested in are indexed... > > I wrote a mail some time ago that detailed my experience, in my > situation with my peculiar data set that you may want to read, > see... > > Lucene 2.1, using FieldSelector speeds up my app by a factor of 10+, > > > As I mentioned in that message, I suspect that my improvement was > *highly* dependent upon how the index is structured..... > > All that said, your notion of benchmarking is a very good one. It lead > me to using FieldSelector in the first place... > > Best > Erick > > On 5/24/07, Carlos Pita <[EMAIL PROTECTED] > wrote: > > > > Hi Erick, > > > > thank you for your prompt answer. What do you mean by loading the > > document? > > Accessing one of the stored fields? In that case I'm afraid I would need > > > to > > do it. For example, in the aforementioned case of a result of products, > I > > have to look at any product store_id, which is stored along the > document. > > Is > > this a performance killer? Maybe I should keep some tables in memory, > for > > example an array mapping from id to store_id in O(1). I will do some > > benchmarking before anyway. > > > > Cheers, > > Carlos > > > > On 5/24/07, Erick Erickson < [EMAIL PROTECTED]> wrote: > > > > > > I know of no way to alter the Hits behavior, I recommend using > > > a TopDocs/TopDocCollector. > > > > > > But be aware that if you load the document for each one, you may incur > > > > a significant penalty, although the lazy-loading helped me a lot, see > > > FieldSelector..... > > > > > > On 5/23/07, Carlos Pita <[EMAIL PROTECTED] > wrote: > > > > > > > > Hi folks, > > > > > > > > I need to collect some global information from my first 1000 search > > > > results > > > > in order to build up some search refining components containing only > > > > > relevant values (those which correspond to at least one of the first > > > 1000 > > > > hits). For example, the results are products and there is a store > > filter > > > > component that shows only the stores that sells a product between > the > > > > first > > > > 1000 hits. So even if the user sees just the first 20, I would have > to > > > > inspect the first 1000. I've read that Hits mantains a cache of > about > > > 100 > > > > or > > > > 200 hits. Is this configurable? If I could set this cache to 1000 I > > > would > > > > then use Hits to browse the search results. Another way, I should > use > > > > HitCollector. What's your advice? > > > > > > > > TIA > > > > Cheers, > > > > Carlos > > > > > > > > > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]