BlockTreeTermsReader consumes crazy amount of memory

Vitaly Funstein Tue, 26 Aug 2014 12:54:26 -0700

This is a follow up to the earlier thread I started to understand memory
usage patterns of SegmentReader instances, but I decided to create a
separate post since this issue is much more serious than the heap overhead
created by use of stored field compression.


Here is the use case, once again. The index totals around 300M documents,
with 7 string, 2 long, 1 integer, 1 date and 1 float fields which are both
indexed and stored. It is split into 4 shards, which are basically separate
indices... if that matters. After the index is populated (but not optimized
since we don't do that), the overall heap usage taken up by Lucene is over
1 GB, much of which is taken up by instances of BlockTreeTermsReader. For
instance for the largest segment in one such an index, the retained heap
size of the internal tree map is around 50 MB. This is evident from heap
dump analysis, which I have screenshots of that I can post here, if that
helps. As there are many segments of various sizes in the index, as
expected, the total heap usage for one shard stands at around 280 MB.

Could someone shed some light on whether this is expected, and if so - how
could I possibly trim down memory usage here? Is there a way to switch to a
different terms index implementation, one that doesn't preload all the
terms into RAM, or only does this partially, i.e. as a cache? I'm not sure
if I'm framing my questions correctly, as I'm obviously not an expert on
Lucene's internals, but this is going to become a critical issue for large
scale use cases of our system.

BlockTreeTermsReader consumes crazy amount of memory

Reply via email to