Hi,

Solr and Elasticsearch implement the exists query like this, which is fully in 
line with your investigation: if a field has docvalues it uses 
DocValuesFieldExistsQuery, if it is a tokenized field it uses the 
NormsFieldExistsQuery. The negative one is a must-not clause, which is 
perfectly fine performance wise.

An alternative way to search is indexing all field names that have a value into 
a separate stringfield. But this needs preprocessing.

https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-exists-query.html

https://issues.apache.org/jira/browse/SOLR-11437

Uwe

Am November 13, 2020 2:19:43 PM UTC schrieb Michael McCandless 
<luc...@mikemccandless.com>:
>That's great Rob!  Thanks for bringing closure.
>
>Mike McCandless
>
>http://blog.mikemccandless.com
>
>
>On Fri, Nov 13, 2020 at 9:13 AM Rob Audenaerde
><rob.audenae...@gmail.com>
>wrote:
>
>> To follow up, based on a quick JMH-test with 2M docs with some random
>data
>> I see a speedup of 70% :)
>> That is a nice friday-afternoon gift, thanks!
>>
>> For ppl that are interested:
>>
>> I added a BinaryDocValues field like this:
>>
>> doc.add(BinaryDocValuesField("GROUPS_ALLOWED_EMPTY", new
>BytesRef(0x01))));
>>
>> And used the finalQuery.add(new DocValuesFieldExistsQuery("
>> GROUPS_ALLOWED_EMPTY", BooleanClause.Occur.SHOULD);
>>
>> On Fri, Nov 13, 2020 at 2:09 PM Michael McCandless <
>> luc...@mikemccandless.com> wrote:
>>
>> > Maybe NormsFieldExistsQuery as a MUST_NOT clause?  Though, you must
>> enable
>> > norms on your field to use that.
>> >
>> > TermRangeQuery is indeed a horribly costly way to execute this, but
>if
>> you
>> > cache the result on each refresh, perhaps it is OK?
>> >
>> > You could also index a dedicated doc values field indicating that
>the
>> > field empty and then use DocValuesFieldExistsQuery.
>> >
>> > Mike McCandless
>> >
>> > http://blog.mikemccandless.com
>> >
>> >
>> > On Fri, Nov 13, 2020 at 7:56 AM Rob Audenaerde
><rob.audenae...@gmail.com
>> >
>> > wrote:
>> >
>> >> Hi all,
>> >>
>> >> We have implemented some security on our index by adding a field
>> >> 'groups_allowed' to documents, and wrap a boolean must query
>around the
>> >> original query, that checks if one of the given user-groups
>matches at
>> >> least one groups_allowed.
>> >>
>> >> We chose to leave the groups_allowed field empty when the document
>> should
>> >> able to be retrieved by all users, so we need to also select a
>document
>> if
>> >> the 'groups_allowed' is empty.
>> >>
>> >> What would be the faster Query construction to do so?
>> >>
>> >>
>> >> Currently I use a TermRangeQuery that basically matches all values
>and
>> put
>> >> that in a MUST_NOT combined with a MatchAllDocumentQuery(), but
>that
>> gets
>> >> rather slow then the number of groups is high.
>> >>
>> >> Thanks!
>> >>
>> >
>>

--
Uwe Schindler
Achterdiek 19, 28357 Bremen
https://www.thetaphi.de

Reply via email to