Re: Slow doc/pos file merges...

2014-12-09 Thread Michael McCandless
Typically the vast majority of terms will in fact have docFreq < 128, but a few very high freq terms may have many 128 blocks, and it's those "costly" terms that you want decode to be fast for. We encode that last partial block as vInt because we don't want to fill 0s into the unoccupied part of t

Re: Slow doc/pos file merges...

2014-12-09 Thread Ravikumar Govindarajan
We have identified the reason for slowness... Lucene41PostingsWriter encodes postings-list as VInt when block-size < 128 and takes a FOR coding approach otherwise... Most of our terms falls under VInt and that's why decompression during merge-reads was eating up a lot of CPU cycles... We switche

Slow doc/pos file merges...

2014-11-17 Thread Ravikumar Govindarajan
Hi, I am finding that lucene is slowing down a lot when bigger and bigger doc/pos files are merged... While it's normally the case, the worrying part is all my data is in RAM. Version is 4.6.1 Some sample statistics took after instrumenting the SortingAtomicReader code, as we use a SortingMergePo