Re: Using Lucene to model ownership of documents

2016-06-22 Thread Geebee Coder
Thanks Denis. My mistake. For a and b, indexing speed, size and search performance was similar. I agree on the simplicity comment. For anyone who might come across this, here's our best solution so far. (for Elastic search) for every customer, use Elastic Search's nested fields e.g. ownership of

Re: Using Lucene to model ownership of documents

2016-06-16 Thread Denis Bazhenov
The speed for a and b, should be the same, at least from conceptual point of view. The number of terms generated for each scenario is equal. Therefore, index size and vocabulary size should be the same. I’m wondering why there is difference. It seems like there is some penalty for writing/readi

Re: Using Lucene to model ownership of documents

2016-06-16 Thread Geebee Coder
Thank you all. Michael, do you mean grouping customers by categories? (e.g. customer A has premium access and so does customer B so they will have access to same set of documents) if that's the case, unfortunately, we don't have such categories of customers, their access rights are over specific do

Re: Using Lucene to model ownership of documents

2016-06-16 Thread Michael Wilkowski
Definitely b). I would also suggest groups and expanding user groups at user sign in time. MW On Thu, Jun 16, 2016 at 12:36 PM, Ian Lea wrote: > I'd definitely go for b). The index will of course be larger for every > extra bit of data you store but it doesn't sound like this would make much >

Re: Using Lucene to model ownership of documents

2016-06-16 Thread Ian Lea
I'd definitely go for b). The index will of course be larger for every extra bit of data you store but it doesn't sound like this would make much difference. Likewise for speed of indexing. -- Ian. On Wed, Jun 15, 2016 at 2:25 PM, Geebee Coder wrote: > Hi there, > I would like to use Lucene

Using Lucene to model ownership of documents

2016-06-15 Thread Geebee Coder
Hi there, I would like to use Lucene to solve the following problem: 1.We have about 100k customers and we have 25 millions of documents. 2.When a customer performs a text search on the document space, we want to return only documents that the customer has access to. 3.The # of documents a custo