Hi there, I would like to use Lucene to solve the following problem: 1.We have about 100k customers and we have 25 millions of documents.
2.When a customer performs a text search on the document space, we want to return only documents that the customer has access to. 3.The # of documents a customer owns varies a lot. some have close to 23 million, some have close to 10k and some own a third of the documents etc. What is an efficient way to use Lucene in this scenario in terms of performance and indexing? We have tried a number of solutions such as a)100k boolean fields per document that indicates whether a customer has access to the document. b)A single text field that has a list of customers who owns the document e.g. (customers field : "abc abd cfx...") c) the above option with shards by customers The search&index performance for a was bad. b,c performed better for search but lengthened the time needed for indexing & index size. We are also thinking about using a custom filter but we are concerned about the memory requirements. Any ideas/suggestions would be really appreciated.