OOM when using Lucene 5.X's group facet collectors on unsharded index

Adam Rosenwald Mon, 06 Jul 2015 16:56:52 -0700

Hello all,

When using Lucene 5.X's group facet collectors (i.e.*AbstractGroupFacetCollector* and the provided concrete implementation,*TermGroupFacetCollector*), I repeatedly encounter OOM errors afterrunning a few search requests on an unsharded index consisting of a fewmillion documents. I had experienced the issue in Lucene 5.0.0 and stillsee it when using 5.2.1.

I've initialized three such collectors to accumulate values overthree different facet fields (all SortedNumericDV fields). Thecollectors all look like the following:


==BEGIN CODE BLOCK==

   AbstractGroupFacetCollector thisFacetCollector =
   TermGroupFacetCollector.createTermGroupFacetCollector(groupField,
                        thisFacetField, facetFieldMultivalued,
   facetPrefix, initialSize);

==END CODE BLOCK==

Note that facetFieldMultivalued = false, facetPrefix = null, andinitialSize = 128. There are a few million unique groups indexed in thegroup field. The heap blows up regardless of the number of uniqueentries in the facet field (one of the facet fields has, e.g., fewerthan 100 unique values).

I have confirmed that the heap ballooning /only/ occurs duringcollection time (i.e. if I comment out the three TermGroupFacetCollectorassignments, I have no OOM issues; even if only one of them is enabled,the heap will eventually face OOM).

Some additional system-related bits. I'm running Lucene 5.2.1 on adev environment w/ ~8GB heap space w/ 16GB total RAM. I am not usingany special codecs. I've confirmed that the indexes (incl. the sidecarfacet indexes) get opened only once during initialization of theservice. Both the index and sidecar facet index directories are openedas NIOFSDirectory objects. I have also tried MMapDirectory andexperience the same problem.

After profiling the heap extensively and after reading the Lucenegroup faceting source code, I suspect that the DVs (for both the groupand facet fields) and/or the arrays used to accumulate facet countsremain memory resident. After executing the same set of queriesmultiple times, I see heap usage balloon by 1-2GB at a time. I've triedsegmenting the index, but while that reduces heap usage for ad-hocsearches, it does not get rid of the OOM issue.


    Any help here would be greatly appreciated.  Many thanks in advance.

--A.

OOM when using Lucene 5.X's group facet collectors on unsharded index

Reply via email to