Re: Filters and multiple, per-segment calls to getDocIdSet

Daniel Noll Thu, 25 Mar 2010 22:18:13 -0700

On Thu, Mar 25, 2010 at 21:41, Michael McCandless
<luc...@mikemccandless.com> wrote:
>
> This depends on the particulars of filter... but in general you
> shouldn't have to consume more RAM, I think?  Ie you should be able to
> do your computation against the top-level reader, and then store the
> results of your computation per-sub-reader.


I am having issues figuring out how to get a reference to the
top-level reader.  The API passes them in one by one and I can't see a
way to find the top-level reader for one which was passed in.  I can't
easily cheat and pass the top-level one into the Filter constructor,
because filters are serialisable and that kind of thing won't survive
serialisation.

To throw an additional spanner in the works, the behaviour I need is
that only the *last* document should be returned.  So even if a
certain document matches the filter after N readers have been passed
in, it might not match the filter after N+1 readers have been passed
in.  Essentially I need a method like...

    DocIdSet[] getDocIdSets(IndexReader[] readers);

And where the readers are guaranteed to be in order of docBase.

By the way, I notice that the order the readers are passed to the
method is essentially undocumented.  The test code appears to be
assuming they will be passed in the natural order of the documents
(which is logical) but couldn't a future change parallelise segment
searches for performance reasons, thus reordering the calls?  It would
be nice if the API would explicitly pass the docBase for the
IndexReader - this would reduce the need to perform maths to determine
the docBase ourselves, and also make it possible to parallelise those
calls later.

Daniel

-- 
Daniel Noll                            Forensic and eDiscovery Software
Senior Developer                              The world's most advanced
Nuix                                                email data analysis
http://nuix.com/                                and eDiscovery software

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Filters and multiple, per-segment calls to getDocIdSet

Reply via email to