Hello all,
When using Lucene 5.X's group facet collectors (i.e.
*AbstractGroupFacetCollector* and the provided concrete implementation,
*TermGroupFacetCollector*), I repeatedly encounter OOM errors after
running a few search requests on an unsharded index consisting of a few
million documents. I had experienced the issue in Lucene 5.0.0 and still
see it when using 5.2.1.
I've initialized three such collectors to accumulate values over
three different facet fields (all SortedNumericDV fields). The
collectors all look like the following:
==BEGIN CODE BLOCK==
AbstractGroupFacetCollector thisFacetCollector =
TermGroupFacetCollector.createTermGroupFacetCollector(groupField,
thisFacetField, facetFieldMultivalued,
facetPrefix, initialSize);
==END CODE BLOCK==
Note that facetFieldMultivalued = false, facetPrefix = null, and
initialSize = 128. There are a few million unique groups indexed in the
group field. The heap blows up regardless of the number of unique
entries in the facet field (one of the facet fields has, e.g., fewer
than 100 unique values).
I have confirmed that the heap ballooning /only/ occurs during
collection time (i.e. if I comment out the three TermGroupFacetCollector
assignments, I have no OOM issues; even if only one of them is enabled,
the heap will eventually face OOM).
Some additional system-related bits. I'm running Lucene 5.2.1 on a
dev environment w/ ~8GB heap space w/ 16GB total RAM. I am not using
any special codecs. I've confirmed that the indexes (incl. the sidecar
facet indexes) get opened only once during initialization of the
service. Both the index and sidecar facet index directories are opened
as NIOFSDirectory objects. I have also tried MMapDirectory and
experience the same problem.
After profiling the heap extensively and after reading the Lucene
group faceting source code, I suspect that the DVs (for both the group
and facet fields) and/or the arrays used to accumulate facet counts
remain memory resident. After executing the same set of queries
multiple times, I see heap usage balloon by 1-2GB at a time. I've tried
segmenting the index, but while that reduces heap usage for ad-hoc
searches, it does not get rid of the OOM issue.
Any help here would be greatly appreciated. Many thanks in advance.
--A.