Switching to "FST50" ought to bring back much of the benefit of "Memory".
On Thu, Aug 23, 2018 at 5:15 PM Adrien Grand <[email protected]> wrote: > The commit that caused this slowdown might be > https://github.com/mikemccand/luceneutil/commit/1d8460f342f269c98047def9f9eb76213acae5d9 > . > > We don't have anything that performs as well anymore indeed, but I'm not > sure this is a big deal. I would suspect that there were not many users of > that postings format, one reason being that it was not supported in terms > of backward compatibility (like any codec but the default one) and another > reason being that it used a lot of RAM. In a number of cases, we try to > fold benefits of alternative codecs in the default codec, for instance we > used to have a "pulsing" postings format that could record postings in the > terms dictionary in order to save one disk seek, and we ended up folding > this feature into the default postings format by only enabling it on terms > that have a document frequency of 1 and index_options=DOCS_ONLY, so that it > would be always used with primary keys. For that postings format, it didn't > really make sense as the way that it managed to be so much faster was by > loading much more information in RAM, which we don't want to do with the > default codec. > > Le jeu. 23 août 2018 à 22:40, Michael Sokolov <[email protected]> a > écrit : > >> I happened to stumble across this chart >> https://home.apache.org/~mikemccand/lucenebench/PKLookup.html showing a >> pretty drastic drop in this benchmark on 5/13. I looked at the commits >> between the previous run and this one and did some investigation, trying to >> do some git bisect to find the problem using benchmarks as a test, but it >> proved to be quite difficult due to a breaking change re: MemoryCodec that >> also required corresponding changes in benchmark code. >> >> In the end, I think removing MemoryCodec is what caused the drop in perf >> here, based on this comment in benchmark code: >> >> '2011-06-26' >> Switched to MemoryCodec for the primary-key 'id' field so that lookups >> (either for PKLookup test or for deletions during reopen in the NRT test) >> are fast, with no IO. Also switched to NRTCachingDirectory for the NRT >> test, so that small new segments are written only in RAM. >> >> I don't really understand the implications here beyond benchmarks, but it >> does seem that perhaps some essential high-performing capability has been >> lost? Is there some equivalent thing remaining after MemoryCodec's removal >> that can be used for primary keys? >> >> -Mike >> > -- Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker LinkedIn: http://linkedin.com/in/davidwsmiley | Book: http://www.solrenterprisesearchserver.com
