With Zulia we chose to rewrite fieldName:* queries to hiddenField:fieldName
and add all field names that are present to a hidden field automatically as
Uwe described as an alternative.  It seems to work well.

https://github.com/zuliaio/zuliasearch/blob/master/zulia-query-parser/src/main/java/io/zulia/server/search/ZuliaQueryParser.java#L218
https://github.com/zuliaio/zuliasearch/blob/master/zulia-server/src/main/java/io/zulia/server/index/ShardDocumentIndexer.java#L122

~Matt

On Fri, Nov 13, 2020 at 9:50 AM Uwe Schindler <u...@thetaphi.de> wrote:

> Hi,
>
> Solr and Elasticsearch implement the exists query like this, which is
> fully in line with your investigation: if a field has docvalues it uses
> DocValuesFieldExistsQuery, if it is a tokenized field it uses the
> NormsFieldExistsQuery. The negative one is a must-not clause, which is
> perfectly fine performance wise.
>
> An alternative way to search is indexing all field names that have a value
> into a separate stringfield. But this needs preprocessing.
>
>
> https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-exists-query.html
>
> https://issues.apache.org/jira/browse/SOLR-11437
>
> Uwe
>
> Am November 13, 2020 2:19:43 PM UTC schrieb Michael McCandless <
> luc...@mikemccandless.com>:
> >That's great Rob!  Thanks for bringing closure.
> >
> >Mike McCandless
> >
> >http://blog.mikemccandless.com
> >
> >
> >On Fri, Nov 13, 2020 at 9:13 AM Rob Audenaerde
> ><rob.audenae...@gmail.com>
> >wrote:
> >
> >> To follow up, based on a quick JMH-test with 2M docs with some random
> >data
> >> I see a speedup of 70% :)
> >> That is a nice friday-afternoon gift, thanks!
> >>
> >> For ppl that are interested:
> >>
> >> I added a BinaryDocValues field like this:
> >>
> >> doc.add(BinaryDocValuesField("GROUPS_ALLOWED_EMPTY", new
> >BytesRef(0x01))));
> >>
> >> And used the finalQuery.add(new DocValuesFieldExistsQuery("
> >> GROUPS_ALLOWED_EMPTY", BooleanClause.Occur.SHOULD);
> >>
> >> On Fri, Nov 13, 2020 at 2:09 PM Michael McCandless <
> >> luc...@mikemccandless.com> wrote:
> >>
> >> > Maybe NormsFieldExistsQuery as a MUST_NOT clause?  Though, you must
> >> enable
> >> > norms on your field to use that.
> >> >
> >> > TermRangeQuery is indeed a horribly costly way to execute this, but
> >if
> >> you
> >> > cache the result on each refresh, perhaps it is OK?
> >> >
> >> > You could also index a dedicated doc values field indicating that
> >the
> >> > field empty and then use DocValuesFieldExistsQuery.
> >> >
> >> > Mike McCandless
> >> >
> >> > http://blog.mikemccandless.com
> >> >
> >> >
> >> > On Fri, Nov 13, 2020 at 7:56 AM Rob Audenaerde
> ><rob.audenae...@gmail.com
> >> >
> >> > wrote:
> >> >
> >> >> Hi all,
> >> >>
> >> >> We have implemented some security on our index by adding a field
> >> >> 'groups_allowed' to documents, and wrap a boolean must query
> >around the
> >> >> original query, that checks if one of the given user-groups
> >matches at
> >> >> least one groups_allowed.
> >> >>
> >> >> We chose to leave the groups_allowed field empty when the document
> >> should
> >> >> able to be retrieved by all users, so we need to also select a
> >document
> >> if
> >> >> the 'groups_allowed' is empty.
> >> >>
> >> >> What would be the faster Query construction to do so?
> >> >>
> >> >>
> >> >> Currently I use a TermRangeQuery that basically matches all values
> >and
> >> put
> >> >> that in a MUST_NOT combined with a MatchAllDocumentQuery(), but
> >that
> >> gets
> >> >> rather slow then the number of groups is high.
> >> >>
> >> >> Thanks!
> >> >>
> >> >
> >>
>
> --
> Uwe Schindler
> Achterdiek 19, 28357 Bremen
> https://www.thetaphi.de

Reply via email to