Hi, it's definitely an interesting question, it happened I had to work personally on acl designs in the past.
It has been a while I don't look at the Lucene/Solr internals of that bit, but first of all, I suspect you are going to get a performance boost if you store documents and acls in the same collection(index). https://solr.apache.org/guide/8_8/other-parsers.html#parameters I would go with *score=none* and *method=topLevelDV*. It could be definitely interesting to compare it with *method=index*. Then I would take a look at the caches involved and tune them appropriately. Further improvements can be obtained but it would be necessary to investigate the internals a bit more. Another alternative could be to denormalize and directly put the reader information in the original document (rather than the Acl ID). This of course brings other observations and consequences. Cheers -------------------------- Alessandro Benedetti Apache Lucene/Solr Committer Director, R&D Software Engineer, Search Consultant www.sease.io On Tue, 23 Mar 2021 at 07:21, k-jingyang <k.jingy...@protonmail.com.invalid> wrote: > Hello everyone, > > I have a use case for my users which I'm having issues implementing. Hoping > to find some insights here. > > We are trying to let our users search for almost any content data that they > have, while respecting access control policies. My users are grouped into > teams, and policies are applied on the content based on teams. > > How we are doing it now is by storing any piece of data as a document, > > { > type: contact_number, > value: 1234567890, > aclId: 1_contact_number > } > > in our content index (6 million documents) > > and > > { > aclId: 1_contact_number, > canRead: TEAM_A > } > > in our acl index (2 million documents). > > DocValues is enabled for aclId on both indexes. > > During query, we'll query the content index and use the Join Query Parser > in > the fq as such, fq={!join from=aclId fromIndex=acl to=acl_id}canRead: > TEAM_A > OR TEAM_B, where the user is part of TEAM_A and TEAM_B. This takes close to > 8 seconds for uncached queries. > > Based on my understanding, this is slow because Solr has to > 1. Retrieve all hits from the acl index > 2. Comb through the entire content index, finding documents whose aclId > matches those hits from the acl index > 3. Apply any remaining content query to filter the results from the content > index > > We have also tried using {!join ... score=none} (based on what we Googled) > > Thoughts on improving this > > - Thought of using streaming expressions but using /export on the content > index requires sorting by fields other than the score > - Querying the content index based on just the content, get the results > filter based on acl on our backend until we have the first 10 results. > - This requires us to load the entire acl > - Repeatedly query content index if documents keep getting dropped > because > of acl > - Benefit is that we don't have to comb the entire content index > - This could be a Plugin? (not sure if it's worth the effort) > > > Am I barking up the wrong tree? > > > > -- > Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html >