Thanks, Joel, that is exactly what we are doing. We have four shards and are sharding on the collapse key. Performance is fine (subsecond) as long as the result set is relatively small. I am really looking for the best way to ensure that this is always true.
On Wed, Mar 23, 2022 at 10:18 PM Joel Bernstein <joels...@gmail.com> wrote: > To collapse on 30 million distinct values is going to cause memory problems > for sure. If the heap is growing as the result set grows that means you are > likely using a newer version of Solr which collapses into a hashmap. Older > versions of Solr would collapse into an array 30 million in length which > probably would have blown up memory with even small result sets. > > I think you're going to need to shard to get this to perform well. With > SolrCloud you can shard on the collapse key ( > > https://solr.apache.org/guide/8_7/shards-and-indexing-data-in-solrcloud.html#document-routing > ). > This will send all documents with the same collapse key to the same shard. > Then run the collapse query on the sharded collection. > > Joel Bernstein > http://joelsolr.blogspot.com/ > >