Re: Slow doc/pos file merges...

2014-12-09 Thread Michael McCandless
Typically the vast majority of terms will in fact have docFreq < 128, but a few very high freq terms may have many 128 blocks, and it's those "costly" terms that you want decode to be fast for. We encode that last partial block as vInt because we don't want to fill 0s into the unoccupied part of t

Re: Slow doc/pos file merges...

2014-12-09 Thread Ravikumar Govindarajan
We have identified the reason for slowness... Lucene41PostingsWriter encodes postings-list as VInt when block-size < 128 and takes a FOR coding approach otherwise... Most of our terms falls under VInt and that's why decompression during merge-reads was eating up a lot of CPU cycles... We switche