On Thu, Mar 25, 2010 at 21:41, Michael McCandless <luc...@mikemccandless.com> wrote: > > This depends on the particulars of filter... but in general you > shouldn't have to consume more RAM, I think? Ie you should be able to > do your computation against the top-level reader, and then store the > results of your computation per-sub-reader.
I am having issues figuring out how to get a reference to the top-level reader. The API passes them in one by one and I can't see a way to find the top-level reader for one which was passed in. I can't easily cheat and pass the top-level one into the Filter constructor, because filters are serialisable and that kind of thing won't survive serialisation. To throw an additional spanner in the works, the behaviour I need is that only the *last* document should be returned. So even if a certain document matches the filter after N readers have been passed in, it might not match the filter after N+1 readers have been passed in. Essentially I need a method like... DocIdSet[] getDocIdSets(IndexReader[] readers); And where the readers are guaranteed to be in order of docBase. By the way, I notice that the order the readers are passed to the method is essentially undocumented. The test code appears to be assuming they will be passed in the natural order of the documents (which is logical) but couldn't a future change parallelise segment searches for performance reasons, thus reordering the calls? It would be nice if the API would explicitly pass the docBase for the IndexReader - this would reduce the need to perform maths to determine the docBase ourselves, and also make it possible to parallelise those calls later. Daniel -- Daniel Noll Forensic and eDiscovery Software Senior Developer The world's most advanced Nuix email data analysis http://nuix.com/ and eDiscovery software --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org