To collapse on 30 million distinct values is going to cause memory problems for sure. If the heap is growing as the result set grows that means you are likely using a newer version of Solr which collapses into a hashmap. Older versions of Solr would collapse into an array 30 million in length which probably would have blown up memory with even small result sets.
I think you're going to need to shard to get this to perform well. With SolrCloud you can shard on the collapse key ( https://solr.apache.org/guide/8_7/shards-and-indexing-data-in-solrcloud.html#document-routing). This will send all documents with the same collapse key to the same shard. Then run the collapse query on the sharded collection. Joel Bernstein http://joelsolr.blogspot.com/ On Wed, Mar 23, 2022 at 9:42 PM Jeremy Buckley - IQ-C <jeremy.buck...@gsa.gov.invalid> wrote: > The number of documents in the collection is about 90 million. The > collapse field has about 30 million distinct values, so I guess that > qualifies as high cardinality. We used to use result grouping but switched > to collapse for improved performance. > > The faceting fields are more of a mix, 5-10 fields ranging from around a > dozen to around 250,000 distinct values. > > On Wed, Mar 23, 2022 at 8:30 PM Joel Bernstein <joels...@gmail.com> wrote: > > > It sounds like you are collapsing on a high cardinality field and/or > > faceting on high cardinality fields. Can you describe the cardinality of > > the fields so we can get an idea of how large the problem is? > > > > Joel Bernstein > > http://joelsolr.blogspot.com/ > > >