DocValues makes fast per doc value lookup possible, which is nice. But it brings other interesting issues.
Assume there are 100M docs and 200 NumericDocValuesFields, this ends up with huge number of disk and memory usage, even if there are just thousands of values for each field. I guess this is because Lucene stores a value for each DocValues field of each document, with variable-length codec. So in such scenario, is it possible only store values for the DocValues field of the docment that actually has a value for that field? Or does Lucene has a column storage mechanism sort of like hash map for DocValues: key: the docId that has a value for the DocValues field value: the value of the DocValues field I am using Lucene 4.2.1. Thanks