bruno-roustant commented on issue #701: LUCENE-8836 Optimize DocValues 
TermsDict to continue scanning from the last position when possible
URL: https://github.com/apache/lucene-solr/pull/701#issuecomment-501189876
 
 
   > Is this a fair test though? Doesn't it ignore the cost added by the 
optimization?
   
   The measure does not ignore the cost added by the optimization. It counts 
all seeks in the IndexInput, including the ones for the optimization, and it 
counts the next-block-first-term read added by the optimization also (counted 
as both a seek and a term read). So the measure is fair in itself.
   
   But it may be more judicious to measure with some other tests that are 
closer to common use cases. I'll try to find some other classes to measure.
   
   
   > Do you have a use case you are targeting and can share results on?
   
   My use-case is the following. We have a list of values (up to 200 values) we 
search in the DocValues index.
   If the index size is rather close to the number of values we search, then it 
is more efficient to iterate sequentially the DocValues index with the 
TermsDict TermsEnum.next().
   But if the index size is large, then it is more efficient to search the 
values, sorted in lexicographical order, by seeking them in the TermsDict. 
That's where the optimization works. In this case we gain at least 25% 
performance.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to