+1 If we would have a new BulkAdder and it could detect long runs of set bits, It also could be at least used in LRUQueryCache to cache part dense docs instead of always building a huge BitSet by maxDoc?
Xugang https://www.amazingkoala.com.cn > On Nov 4, 2022, at 08:15, Michael Froh <[email protected]> wrote: > > Hi, > > I was recently poking around in the createWeight implementation for > MultiTermQueryConstantScoreWrapper to get to the bottom of some slow queries, > and I realized that the worst-case performance could be pretty bad, but > (maybe) possible to optimize for. > > Imagine if we have a segment with N docs and our MultiTermQuery expands to > hit M terms, where each of the M terms matches N-1 docs. (If we matched all N > docs, then Greg Miller's recent optimization to replace the MultiTermQuery > with a TermQuery would kick in.) In this case, my understanding is that we > would iterate through all the terms and pass each one's postings to a > DocIdSetBuilder, which will iterate through the postings to set bits. This > whole thing would be O(MN), I think. > > I was thinking that it would be cool if the DocIdSetBuilder could detect long > runs of set bits and advance() each DISI to skip over them (since they're > guaranteed not to contribute anything new to the union). In the worst case > that I described above, I think it would make the whole thing O(M log N) > (assuming advance() takes log time). > > At the risk of overcomplicating things, maybe DocIdSetBuilder could use a > third ("dense") BulkAdder implementation that kicks in once enough bits are > set, to efficiently implement the "or" operation to skip over known > (sufficiently long) runs of set bits? > > Would something like that be useful? Is the "dense union of doc IDs" case > common enough to warrant it? > > Thanks, > Froh
