On Tue, Apr 9, 2013 at 9:11 AM, Adrien Grand <jpou...@gmail.com> wrote:
> The default codec stores numeric doc values by blocks of 4096 values > that have independent numbers of bits per values. If you end up having > most of these blocks empty, doc values will require little space but > in a worst-case scenario where each block contains 1 single value, it > is true that memory and disk usage will be very inefficient. > Also the default codec has a performance hack (depending on acceptableOverHead) for optimizing the single byte case (e.g. norms or other smallfloat scoring factor). In this case it doesn't even use blockpackedwriter at all. Thats why I recommended diskdv codec instead... the concepts are the same but its not yet "optimized" so its easier to understand whats going on :)