With Zulia we chose to rewrite fieldName:* queries to hiddenField:fieldName and add all field names that are present to a hidden field automatically as Uwe described as an alternative. It seems to work well.
https://github.com/zuliaio/zuliasearch/blob/master/zulia-query-parser/src/main/java/io/zulia/server/search/ZuliaQueryParser.java#L218 https://github.com/zuliaio/zuliasearch/blob/master/zulia-server/src/main/java/io/zulia/server/index/ShardDocumentIndexer.java#L122 ~Matt On Fri, Nov 13, 2020 at 9:50 AM Uwe Schindler <u...@thetaphi.de> wrote: > Hi, > > Solr and Elasticsearch implement the exists query like this, which is > fully in line with your investigation: if a field has docvalues it uses > DocValuesFieldExistsQuery, if it is a tokenized field it uses the > NormsFieldExistsQuery. The negative one is a must-not clause, which is > perfectly fine performance wise. > > An alternative way to search is indexing all field names that have a value > into a separate stringfield. But this needs preprocessing. > > > https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-exists-query.html > > https://issues.apache.org/jira/browse/SOLR-11437 > > Uwe > > Am November 13, 2020 2:19:43 PM UTC schrieb Michael McCandless < > luc...@mikemccandless.com>: > >That's great Rob! Thanks for bringing closure. > > > >Mike McCandless > > > >http://blog.mikemccandless.com > > > > > >On Fri, Nov 13, 2020 at 9:13 AM Rob Audenaerde > ><rob.audenae...@gmail.com> > >wrote: > > > >> To follow up, based on a quick JMH-test with 2M docs with some random > >data > >> I see a speedup of 70% :) > >> That is a nice friday-afternoon gift, thanks! > >> > >> For ppl that are interested: > >> > >> I added a BinaryDocValues field like this: > >> > >> doc.add(BinaryDocValuesField("GROUPS_ALLOWED_EMPTY", new > >BytesRef(0x01)))); > >> > >> And used the finalQuery.add(new DocValuesFieldExistsQuery(" > >> GROUPS_ALLOWED_EMPTY", BooleanClause.Occur.SHOULD); > >> > >> On Fri, Nov 13, 2020 at 2:09 PM Michael McCandless < > >> luc...@mikemccandless.com> wrote: > >> > >> > Maybe NormsFieldExistsQuery as a MUST_NOT clause? Though, you must > >> enable > >> > norms on your field to use that. > >> > > >> > TermRangeQuery is indeed a horribly costly way to execute this, but > >if > >> you > >> > cache the result on each refresh, perhaps it is OK? > >> > > >> > You could also index a dedicated doc values field indicating that > >the > >> > field empty and then use DocValuesFieldExistsQuery. > >> > > >> > Mike McCandless > >> > > >> > http://blog.mikemccandless.com > >> > > >> > > >> > On Fri, Nov 13, 2020 at 7:56 AM Rob Audenaerde > ><rob.audenae...@gmail.com > >> > > >> > wrote: > >> > > >> >> Hi all, > >> >> > >> >> We have implemented some security on our index by adding a field > >> >> 'groups_allowed' to documents, and wrap a boolean must query > >around the > >> >> original query, that checks if one of the given user-groups > >matches at > >> >> least one groups_allowed. > >> >> > >> >> We chose to leave the groups_allowed field empty when the document > >> should > >> >> able to be retrieved by all users, so we need to also select a > >document > >> if > >> >> the 'groups_allowed' is empty. > >> >> > >> >> What would be the faster Query construction to do so? > >> >> > >> >> > >> >> Currently I use a TermRangeQuery that basically matches all values > >and > >> put > >> >> that in a MUST_NOT combined with a MatchAllDocumentQuery(), but > >that > >> gets > >> >> rather slow then the number of groups is high. > >> >> > >> >> Thanks! > >> >> > >> > > >> > > -- > Uwe Schindler > Achterdiek 19, 28357 Bremen > https://www.thetaphi.de