Hi there,
I would like to use Lucene to solve the following problem:

1.We have about 100k customers and we have 25 millions of documents.

2.When a customer performs a text search on the document space, we want to
return only documents that the customer has access to.

3.The # of documents a customer owns varies a lot. some have close to 23
million, some have close to 10k and some own a third of the documents etc.

What is an efficient way to use Lucene in this scenario in terms of
performance and indexing?
We have tried a number of solutions such as

 a)100k boolean fields per document that indicates whether a customer has
access to the document.
 b)A single text field that has a list of customers who owns the document
e.g. (customers field : "abc abd cfx...")
c) the above option with shards by customers

The search&index performance for a was bad. b,c performed better for search
but lengthened the time needed for indexing & index size.
We are also thinking about using a custom filter but we are concerned about
the memory requirements.

Any ideas/suggestions would be really appreciated.

Reply via email to