Are you determining your "top doc" for each collapsed group based on score?
If your use case is such that you determine the "top doc" based on a static
field with a manageable number of values, you may have other options
available to you. (For some use cases it can be acceptable to "pre-filter"
the domain with creative fq params. This works iff your "collapse" could be
considered a type of "deduplication" with doc priority determined by a
static field; but it's a non-starter if you know you need to search over
the full uncollapsed domain.)

Michael

On Thu, Mar 24, 2022 at 9:11 AM Joel Bernstein <joels...@gmail.com> wrote:

> Yeah, that's a tricky problem. Keeping the result set small without losing
> results. I don't have an answer except as you already mentioned which would
> be to limit the query in some way.
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
>
> On Thu, Mar 24, 2022 at 8:24 AM Jeremy Buckley - IQ-C
> <jeremy.buck...@gsa.gov.invalid> wrote:
>
> > Thanks, Joel, that is exactly what we are doing.  We have four shards and
> > are sharding on the collapse key.  Performance is fine (subsecond) as
> long
> > as the result set is relatively small.  I am really looking for the best
> > way to ensure that this is always true.
> >
> > On Wed, Mar 23, 2022 at 10:18 PM Joel Bernstein <joels...@gmail.com>
> > wrote:
> >
> > > To collapse on 30 million distinct values is going to cause memory
> > problems
> > > for sure. If the heap is growing as the result set grows that means you
> > are
> > > likely using a newer version of Solr which collapses into a hashmap.
> > Older
> > > versions of Solr would collapse into an array 30 million in length
> which
> > > probably would have blown up memory with even small result sets.
> > >
> > > I think you're going to need to shard to get this to perform well. With
> > > SolrCloud you can shard on the collapse key (
> > >
> > >
> >
> https://solr.apache.org/guide/8_7/shards-and-indexing-data-in-solrcloud.html#document-routing
> > > ).
> > > This will send all documents with the same collapse key to the same
> > shard.
> > > Then run the collapse query on the sharded collection.
> > >
> > > Joel Bernstein
> > > http://joelsolr.blogspot.com/
> > >
> > >
> >
>

Reply via email to