Hi, Solr and Elasticsearch implement the exists query like this, which is fully in line with your investigation: if a field has docvalues it uses DocValuesFieldExistsQuery, if it is a tokenized field it uses the NormsFieldExistsQuery. The negative one is a must-not clause, which is perfectly fine performance wise.
An alternative way to search is indexing all field names that have a value into a separate stringfield. But this needs preprocessing. https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-exists-query.html https://issues.apache.org/jira/browse/SOLR-11437 Uwe Am November 13, 2020 2:19:43 PM UTC schrieb Michael McCandless <luc...@mikemccandless.com>: >That's great Rob! Thanks for bringing closure. > >Mike McCandless > >http://blog.mikemccandless.com > > >On Fri, Nov 13, 2020 at 9:13 AM Rob Audenaerde ><rob.audenae...@gmail.com> >wrote: > >> To follow up, based on a quick JMH-test with 2M docs with some random >data >> I see a speedup of 70% :) >> That is a nice friday-afternoon gift, thanks! >> >> For ppl that are interested: >> >> I added a BinaryDocValues field like this: >> >> doc.add(BinaryDocValuesField("GROUPS_ALLOWED_EMPTY", new >BytesRef(0x01)))); >> >> And used the finalQuery.add(new DocValuesFieldExistsQuery(" >> GROUPS_ALLOWED_EMPTY", BooleanClause.Occur.SHOULD); >> >> On Fri, Nov 13, 2020 at 2:09 PM Michael McCandless < >> luc...@mikemccandless.com> wrote: >> >> > Maybe NormsFieldExistsQuery as a MUST_NOT clause? Though, you must >> enable >> > norms on your field to use that. >> > >> > TermRangeQuery is indeed a horribly costly way to execute this, but >if >> you >> > cache the result on each refresh, perhaps it is OK? >> > >> > You could also index a dedicated doc values field indicating that >the >> > field empty and then use DocValuesFieldExistsQuery. >> > >> > Mike McCandless >> > >> > http://blog.mikemccandless.com >> > >> > >> > On Fri, Nov 13, 2020 at 7:56 AM Rob Audenaerde ><rob.audenae...@gmail.com >> > >> > wrote: >> > >> >> Hi all, >> >> >> >> We have implemented some security on our index by adding a field >> >> 'groups_allowed' to documents, and wrap a boolean must query >around the >> >> original query, that checks if one of the given user-groups >matches at >> >> least one groups_allowed. >> >> >> >> We chose to leave the groups_allowed field empty when the document >> should >> >> able to be retrieved by all users, so we need to also select a >document >> if >> >> the 'groups_allowed' is empty. >> >> >> >> What would be the faster Query construction to do so? >> >> >> >> >> >> Currently I use a TermRangeQuery that basically matches all values >and >> put >> >> that in a MUST_NOT combined with a MatchAllDocumentQuery(), but >that >> gets >> >> rather slow then the number of groups is high. >> >> >> >> Thanks! >> >> >> > >> -- Uwe Schindler Achterdiek 19, 28357 Bremen https://www.thetaphi.de