Adrien and Rober, thanks a lot for the hints. Will try a few options and see how it goes.
On Tue, Apr 9, 2013 at 9:25 AM, Robert Muir <rcm...@gmail.com> wrote: > On Tue, Apr 9, 2013 at 9:11 AM, Adrien Grand <jpou...@gmail.com> wrote: > > > The default codec stores numeric doc values by blocks of 4096 values > > that have independent numbers of bits per values. If you end up having > > most of these blocks empty, doc values will require little space but > > in a worst-case scenario where each block contains 1 single value, it > > is true that memory and disk usage will be very inefficient. > > > > Also the default codec has a performance hack (depending on > acceptableOverHead) for optimizing the single byte case (e.g. norms or > other smallfloat scoring factor). In this case it doesn't even use > blockpackedwriter at all. > > Thats why I recommended diskdv codec instead... the concepts are the same > but its not yet "optimized" so its easier to understand whats going on :) >