Re: TermsEnum.seekExact degraded performance somewhere between Lucene 7.7.0 and 8.5.1.

2020-07-27 Thread Adrien Grand
It's interesting you're not seeing the same slowdown on the other field. How hard would it be for you to test what the performance is if you lowercase the name of the digest algorithms, ie. "md5;[md5 value in hex]", etc. The reason I'm asking is because the compression logic is optimized for lowerc

Re: TermsEnum.seekExact degraded performance somewhere between Lucene 7.7.0 and 8.5.1.

2020-07-27 Thread Trejkaz
Yep, the timings posted were the best speed out of 10 runs in a row. The profiling was done in the middle of 1000 iterations in a row just to knock off any warm-up time. The sort of data we're storing in the field is quite possibly a worst-case scenario for the compression. The data is mixed diges

Re: TermsEnum.seekExact degraded performance somewhere between Lucene 7.7.0 and 8.5.1.

2020-07-27 Thread Adrien Grand
Alex, this issue you linked is about the terms dictionary of doc values. Trejkaz linked the correct issue which is about the terms dictionary of the inverted index. It's interesting you're seeing so much time spent in readVInt on 8.5 since there is a single vint that is read for each block in "Low