The commit that caused this slowdown might be https://github.com/mikemccand/luceneutil/commit/1d8460f342f269c98047def9f9eb76213acae5d9 .
We don't have anything that performs as well anymore indeed, but I'm not sure this is a big deal. I would suspect that there were not many users of that postings format, one reason being that it was not supported in terms of backward compatibility (like any codec but the default one) and another reason being that it used a lot of RAM. In a number of cases, we try to fold benefits of alternative codecs in the default codec, for instance we used to have a "pulsing" postings format that could record postings in the terms dictionary in order to save one disk seek, and we ended up folding this feature into the default postings format by only enabling it on terms that have a document frequency of 1 and index_options=DOCS_ONLY, so that it would be always used with primary keys. For that postings format, it didn't really make sense as the way that it managed to be so much faster was by loading much more information in RAM, which we don't want to do with the default codec. Le jeu. 23 août 2018 à 22:40, Michael Sokolov <[email protected]> a écrit : > I happened to stumble across this chart > https://home.apache.org/~mikemccand/lucenebench/PKLookup.html showing a > pretty drastic drop in this benchmark on 5/13. I looked at the commits > between the previous run and this one and did some investigation, trying to > do some git bisect to find the problem using benchmarks as a test, but it > proved to be quite difficult due to a breaking change re: MemoryCodec that > also required corresponding changes in benchmark code. > > In the end, I think removing MemoryCodec is what caused the drop in perf > here, based on this comment in benchmark code: > > '2011-06-26' > Switched to MemoryCodec for the primary-key 'id' field so that lookups > (either for PKLookup test or for deletions during reopen in the NRT test) > are fast, with no IO. Also switched to NRTCachingDirectory for the NRT > test, so that small new segments are written only in RAM. > > I don't really understand the implications here beyond benchmarks, but it > does seem that perhaps some essential high-performing capability has been > lost? Is there some equivalent thing remaining after MemoryCodec's removal > that can be used for primary keys? > > -Mike >
