Are you determining your "top doc" for each collapsed group based on score? If your use case is such that you determine the "top doc" based on a static field with a manageable number of values, you may have other options available to you. (For some use cases it can be acceptable to "pre-filter" the domain with creative fq params. This works iff your "collapse" could be considered a type of "deduplication" with doc priority determined by a static field; but it's a non-starter if you know you need to search over the full uncollapsed domain.)
Michael On Thu, Mar 24, 2022 at 9:11 AM Joel Bernstein <joels...@gmail.com> wrote: > Yeah, that's a tricky problem. Keeping the result set small without losing > results. I don't have an answer except as you already mentioned which would > be to limit the query in some way. > > > Joel Bernstein > http://joelsolr.blogspot.com/ > > > On Thu, Mar 24, 2022 at 8:24 AM Jeremy Buckley - IQ-C > <jeremy.buck...@gsa.gov.invalid> wrote: > > > Thanks, Joel, that is exactly what we are doing. We have four shards and > > are sharding on the collapse key. Performance is fine (subsecond) as > long > > as the result set is relatively small. I am really looking for the best > > way to ensure that this is always true. > > > > On Wed, Mar 23, 2022 at 10:18 PM Joel Bernstein <joels...@gmail.com> > > wrote: > > > > > To collapse on 30 million distinct values is going to cause memory > > problems > > > for sure. If the heap is growing as the result set grows that means you > > are > > > likely using a newer version of Solr which collapses into a hashmap. > > Older > > > versions of Solr would collapse into an array 30 million in length > which > > > probably would have blown up memory with even small result sets. > > > > > > I think you're going to need to shard to get this to perform well. With > > > SolrCloud you can shard on the collapse key ( > > > > > > > > > https://solr.apache.org/guide/8_7/shards-and-indexing-data-in-solrcloud.html#document-routing > > > ). > > > This will send all documents with the same collapse key to the same > > shard. > > > Then run the collapse query on the sharded collection. > > > > > > Joel Bernstein > > > http://joelsolr.blogspot.com/ > > > > > > > > >