If I understand it correctly, the Zoie library [1][2] implements the "sledgehammer" approach by collecting docValues for all documents when a segment reader is opened. If you have some RAM to throw at the problem, this could indeed bring you an acceptable level of performance.
[1] http://senseidb.github.io/zoie/ [2] https://github.com/senseidb/zoie/blob/master/zoie-core/src/main/java/proj/zoie/api/impl/DocIDMapperImpl.java On Sun, Aug 9, 2015 at 9:41 AM, Trejkaz <trej...@trypticon.org> wrote: > On Fri, Aug 7, 2015 at 5:34 PM, Adrien Grand <jpou...@gmail.com> wrote: > > Does your application actually iterate in order over dense ids, or is > > it just for benchmarking purposes? Because if it does, you probably > > don't actually need seeking, you could just see what the current ID in > > the terms enum is. > > Both dense ID fetches and individual ID fetches exist in the > application. I put them in a benchmark deliberately doing it as > individual fetches to get an idea of average timing for a single > operation. > > There are so many use cases of doing the individual fetches that it's > tough to enumerate. The first one I found was "fetch the term vector > for ID + field" but I'm sure there will be tons of them. > > For mapping a dense set of IDs to doc IDs (e.g. for filtering), I > would probably use something like DocValuesTermsQuery for that to get > them all in one shot. I also wondered whether writing our filters as > queries would help, but I think it would turn out to be about as fast > as DocValuesTermsQuery even if I did that. > > I'm sure the only way to really improve the speed of these filters is > to start storing these things in the text index and use query-time > joins, but I can't do that until I solve the issue of relying on > stable doc IDs and it seems like trying to solve two large problems in > a single commit would be biting off more than I can chew. > > > If you actually need seeking, then you should try > > to avoid MultiFields, it will call seedExact on each segment, while > > given what I see you could just stop after you found one segment with > > the value. > > Ah, I did wonder whether MultiFields had any behaviour like that, so > that definitely means that I will avoid using it. Then I can try other > tricks, like trying the seeks in order of segment size (the largest > segment is most likely to contain the hit.) > > TX > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > -- András