Re: Representative filtering of very large result sets

Jeremy Buckley - IQ-C Thu, 24 Mar 2022 05:24:18 -0700

Thanks, Joel, that is exactly what we are doing.  We have four shards and
are sharding on the collapse key.  Performance is fine (subsecond) as long
as the result set is relatively small.  I am really looking for the best
way to ensure that this is always true.


On Wed, Mar 23, 2022 at 10:18 PM Joel Bernstein <joels...@gmail.com> wrote:

> To collapse on 30 million distinct values is going to cause memory problems
> for sure. If the heap is growing as the result set grows that means you are
> likely using a newer version of Solr which collapses into a hashmap. Older
> versions of Solr would collapse into an array 30 million in length which
> probably would have blown up memory with even small result sets.
>
> I think you're going to need to shard to get this to perform well. With
> SolrCloud you can shard on the collapse key (
>
> https://solr.apache.org/guide/8_7/shards-and-indexing-data-in-solrcloud.html#document-routing
> ).
> This will send all documents with the same collapse key to the same shard.
> Then run the collapse query on the sharded collection.
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
>

Re: Representative filtering of very large result sets

Reply via email to