Hi Carsten, You're right that Lucene document numbers are ephemeral, but they are consistent for a certain IndexReader instance. So perhaps you can use SearcherLifetimeManager to obtain a 'version' of the reader that returned the original results and store a bitset together with that version. Then when the user further searches this subset of documents, you pull the relevant reader from SLM given the 'version' information.
I think that you can write your own Pruner which prunes IR instances/versions when their corresponding docs subset tables are no longer needed... Shai On Fri, Apr 12, 2013 at 9:08 PM, SUJIT PAL <sujit....@comcast.net> wrote: > Hi Carsten, > > Why not use your idea of the BooleanQuery but wrap it in a Filter instead? > Since you are not doing any scoring (only filtering), the max boolean > clauses limit should not apply to a filter. > > -sujit > > On Apr 12, 2013, at 7:34 AM, Carsten Schnober wrote: > > > Dear list, > > I would like to create a sub-set of the documents in an index that is to > > be used for further searches. However, the criteria that lead to the > > creation of that sub-set are not predefined so I think that faceted > > search cannot be applied my this use case. > > > > For instance: > > A user searches for documents that contain token 'A' in a field 'text'. > > These results form a set of documents that is persistently stored (in a > > database). Each document in the index has a field 'id' that identifies > > it, so these "external" IDs are stored in the database. > > > > Later on, a user loads the document IDs from the database and wants to > > execute another search on this set of documents only. However, > > performing a search on the full index and subsequently filtering the > > results against that list of documents takes very long if there are many > > matches. This is obvious as I have to retrieve the external id from each > > matching document and check whether it is part of the desired sub-set. > > Constructing a BooleanQuery in the style "id:Doc1 OR id:Doc2 ..." is not > > suitable either because there could be thousands of documents exceeding > > any limit for Boolean clauses. > > > > Any suggestions how to solve this? I would have gone for the Lucene > > document numbers and store them as a bit set that I could use as a > > filter during later searches, but I read that the document numbers are > > ephemeral. > > > > One possible way out seems to be to create another index from the > > documents that have matched the initial search, but this seems quite an > > overkill, especially if there are plenty of them... > > > > Thanks for any hint! > > Carsten > > > > -- > > Institut für Deutsche Sprache | http://www.ids-mannheim.de > > Projekt KorAP | http://korap.ids-mannheim.de > > Tel. +49-(0)621-43740789 | schno...@ids-mannheim.de > > Korpusanalyseplattform der nächsten Generation > > Next Generation Corpus Analysis Platform > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >