Dear list, I would like to create a sub-set of the documents in an index that is to be used for further searches. However, the criteria that lead to the creation of that sub-set are not predefined so I think that faceted search cannot be applied my this use case.
For instance: A user searches for documents that contain token 'A' in a field 'text'. These results form a set of documents that is persistently stored (in a database). Each document in the index has a field 'id' that identifies it, so these "external" IDs are stored in the database. Later on, a user loads the document IDs from the database and wants to execute another search on this set of documents only. However, performing a search on the full index and subsequently filtering the results against that list of documents takes very long if there are many matches. This is obvious as I have to retrieve the external id from each matching document and check whether it is part of the desired sub-set. Constructing a BooleanQuery in the style "id:Doc1 OR id:Doc2 ..." is not suitable either because there could be thousands of documents exceeding any limit for Boolean clauses. Any suggestions how to solve this? I would have gone for the Lucene document numbers and store them as a bit set that I could use as a filter during later searches, but I read that the document numbers are ephemeral. One possible way out seems to be to create another index from the documents that have matched the initial search, but this seems quite an overkill, especially if there are plenty of them... Thanks for any hint! Carsten -- Institut für Deutsche Sprache | http://www.ids-mannheim.de Projekt KorAP | http://korap.ids-mannheim.de Tel. +49-(0)621-43740789 | schno...@ids-mannheim.de Korpusanalyseplattform der nächsten Generation Next Generation Corpus Analysis Platform --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org