Yes, the number of documents is not too large (about 90 000), but the queries are very hard. Although they're just boolean, a typical query can produce a result with tens of millions of hits. Single-threadedly such a query runs ~20 seconds, which is too slow. therefore, multithreading is vital for this task.
As you mentioned, merges are the source of non-uniform segments sizes. Therefore, as my index is fully static (every time I need a re-index, I can do it from scratch), I'm gonna give a try to NoMergePolicy with some reasonable maximum segment size. If there are some other multithreading caveats, they're highly welcomed. -- Best Regards, Igor 02.04.2013, 18:07, "Adrien Grand" <jpou...@gmail.com>: > On Tue, Apr 2, 2013 at 2:29 PM, Igor Shalyminov > <ishalymi...@yandex-team.ru> wrote: > >> Hello! > > Hi Igor, > >> I have a ~20GB index and try to make a concurrent search over it. >> The index has 16 segments, I run SpanQuery.getSpans() on each segment >> concurrently. >> I see really small performance improvement of searching concurrently. I >> suppose, the reason is that the sizes of the segments are very non-uniform >> (3 segments have ~20 000 docs each, and the others have less than 1 000 >> each). >> How to make more uniformly sized segments (I now use just >> writer.forceMerge(16)), and are multiple index segments the most important >> thing in Lucene concurrency? > > Segments have non uniform sizes by design. A segment is generated > every time a flush happens (when the ram buffer is full or if you > explicitely call commit). When there are two many segments, Lucene > merges some of them while new segments keep being generated as you add > data. So the "flush" segments will always be small while segments > resulting from a merge will be much larger since they contain data > from several other segments. > > Even if segments are collected concurrently, IndexSearcher needs to > merge the results of the collection of each segments in the end. Since > your segments are very small (20000 docs), maybe the cost of > initialization/merge is not negligible compared to single-segment > collection. > > -- > Adrien > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org