Re: Representative filtering of very large result sets

Joel Bernstein Wed, 23 Mar 2022 19:18:40 -0700

To collapse on 30 million distinct values is going to cause memory problems
for sure. If the heap is growing as the result set grows that means you are
likely using a newer version of Solr which collapses into a hashmap. Older
versions of Solr would collapse into an array 30 million in length which
probably would have blown up memory with even small result sets.

I think you're going to need to shard to get this to perform well. With
SolrCloud you can shard on the collapse key (
https://solr.apache.org/guide/8_7/shards-and-indexing-data-in-solrcloud.html#document-routing).
This will send all documents with the same collapse key to the same shard.
Then run the collapse query on the sharded collection.

Joel Bernstein
http://joelsolr.blogspot.com/

On Wed, Mar 23, 2022 at 9:42 PM Jeremy Buckley - IQ-C
<jeremy.buck...@gsa.gov.invalid> wrote:

> The number of documents in the collection is about 90 million. The
> collapse field has about 30 million distinct values, so I guess that
> qualifies as high cardinality.  We used to use result grouping but switched
> to collapse for improved performance.
>
> The faceting fields are more of a mix, 5-10 fields ranging from around a
> dozen to around 250,000 distinct values.
>
> On Wed, Mar 23, 2022 at 8:30 PM Joel Bernstein <joels...@gmail.com> wrote:
>
> > It sounds like you are collapsing on a high cardinality field and/or
> > faceting on high cardinality fields. Can you describe the cardinality of
> > the fields so we can get an idea of how large the problem is?
> >
> > Joel Bernstein
> > http://joelsolr.blogspot.com/
> >
>

Re: Representative filtering of very large result sets

Reply via email to