On Fri, Aug 7, 2015 at 5:34 PM, Adrien Grand <jpou...@gmail.com> wrote: > Does your application actually iterate in order over dense ids, or is > it just for benchmarking purposes? Because if it does, you probably > don't actually need seeking, you could just see what the current ID in > the terms enum is.
Both dense ID fetches and individual ID fetches exist in the application. I put them in a benchmark deliberately doing it as individual fetches to get an idea of average timing for a single operation. There are so many use cases of doing the individual fetches that it's tough to enumerate. The first one I found was "fetch the term vector for ID + field" but I'm sure there will be tons of them. For mapping a dense set of IDs to doc IDs (e.g. for filtering), I would probably use something like DocValuesTermsQuery for that to get them all in one shot. I also wondered whether writing our filters as queries would help, but I think it would turn out to be about as fast as DocValuesTermsQuery even if I did that. I'm sure the only way to really improve the speed of these filters is to start storing these things in the text index and use query-time joins, but I can't do that until I solve the issue of relying on stable doc IDs and it seems like trying to solve two large problems in a single commit would be biting off more than I can chew. > If you actually need seeking, then you should try > to avoid MultiFields, it will call seedExact on each segment, while > given what I see you could just stop after you found one segment with > the value. Ah, I did wonder whether MultiFields had any behaviour like that, so that definitely means that I will avoid using it. Then I can try other tricks, like trying the seeks in order of segment size (the largest segment is most likely to contain the hit.) TX --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org