Re: TermsEnum.seekExact degraded performance somewhere between Lucene 7.7.0 and 8.5.1.

2020-08-06 Thread Adrien Grand
I'm puzzled. I would have expected the digest-no-prefix times to be faster on JDK14 than on your older JDK for the same reason that digest-lower and digest-upper got faster. I wonder if part of the reason why the no-prefix variant is faster is because it is better at identifying that the digest d

Re: TermsEnum.seekExact degraded performance somewhere between Lucene 7.7.0 and 8.5.1.

2020-07-30 Thread Trejkaz
On Mon, 27 Jul 2020 at 19:24, Adrien Grand wrote: > > It's interesting you're not seeing the same slowdown on the other field. > How hard would it be for you to test what the performance is if you > lowercase the name of the digest algorithms, ie. "md5;[md5 value in hex]", > etc. The reason I'm as

Re: TermsEnum.seekExact degraded performance somewhere between Lucene 7.7.0 and 8.5.1.

2020-07-27 Thread Adrien Grand
It's interesting you're not seeing the same slowdown on the other field. How hard would it be for you to test what the performance is if you lowercase the name of the digest algorithms, ie. "md5;[md5 value in hex]", etc. The reason I'm asking is because the compression logic is optimized for lowerc

Re: TermsEnum.seekExact degraded performance somewhere between Lucene 7.7.0 and 8.5.1.

2020-07-27 Thread Trejkaz
Yep, the timings posted were the best speed out of 10 runs in a row. The profiling was done in the middle of 1000 iterations in a row just to knock off any warm-up time. The sort of data we're storing in the field is quite possibly a worst-case scenario for the compression. The data is mixed diges

Re: TermsEnum.seekExact degraded performance somewhere between Lucene 7.7.0 and 8.5.1.

2020-07-27 Thread Adrien Grand
Alex, this issue you linked is about the terms dictionary of doc values. Trejkaz linked the correct issue which is about the terms dictionary of the inverted index. It's interesting you're seeing so much time spent in readVInt on 8.5 since there is a single vint that is read for each block in "Low

Re: TermsEnum.seekExact degraded performance somewhere between Lucene 7.7.0 and 8.5.1.

2020-07-26 Thread Alex K
Hi, Also have a look here: https://issues.apache.org/jira/plugins/servlet/mobile#issue/LUCENE-9378 Seems it might be related. - Alex On Sun, Jul 26, 2020, 23:31 Trejkaz wrote: > Hi all. > > I've been tracking down slow seeking performance in TermsEnum after > updating to Lucene 8.5.1. > > On 8

Fwd: TermsEnum.seekExact degraded performance somewhere between Lucene 7.7.0 and 8.5.1.

2020-07-26 Thread Trejkaz
Hi all. I've been tracking down slow seeking performance in TermsEnum after updating to Lucene 8.5.1. On 8.5.1: SegmentTermsEnum.seekExact: 33,829 ms (70.2%) (remaining time in our code) SegmentTermsEnumFrame.loadBlock: 29,104 ms (60.4%) CompressionAlgorithm$2.read: 25,78