There are two features meant to support deep/large results:

Export handler (Requires docvalues enabled)
<https://solr.apache.org/guide/solr/latest/query-guide/exporting-result-sets.html>
Cursormark (Requires serial requests passing a token from the prior request)
<https://solr.apache.org/guide/solr/latest/query-guide/pagination-of-results.html#fetching-a-large-number-of-sorted-results-cursors>

As Shawn noted, simply using the rows parameter when attempting to pull
large result sets will fail. It gets slower the deeper each page is. By the
last page you'll be asking the server to hold (and potentially sort) every
document that matches the query in memory.

On Tue, Feb 20, 2024 at 5:28 PM Shawn Heisey <apa...@elyograg.org.invalid>
wrote:

> On 2/13/24 19:46, Fiz N wrote:
> > Hi SOLR Experts,
> >
> > I am facing one challenge regarding performance in *Java Spring Boot +
> > Apache Solr Cloud *based application.
> >
> > Details are as follows:
> >
> > *Data present in Solr collection*: 41 million
> >
> > *Shards:* 8
> >
> > *Replicas:* 2
> >
> > *Solr Version:* 8.2.0
>
> There is not a lot of info here.  How big is each core (shard replica)?
> How many cores per server?  How many servers?  How much total system
> memory?  What is the Solr heap size?  Do you have multiple Solr
> instances on one server?  Have you tried removing Spring Boot from the
> equation and making queries directly to Solr with your browser?
>
> >
> > *Issue:* we are having export functionality based on filters applied by
> > user on data, maximum export limit is 1 lakh records. It is taking more
> > time than expected (approximately 10 min and even its not consistent.
> > Sometimes in seconds and sometimes more than 8-10 min)
>
> What is the precise request you are sending to Solr?  We will probably
> need your schema, your solrconfig.xml, and the contents of a typical
> document.
>
> > *Limitation:*
> >
> >     1. Multi-threading cannot be used.
>
> This is extremely vague.  No idea what you mean here.  Solr itself is
> inherently multi-threaded and this cannot be disabled.
>
> >     2. We are not allowed to pull more than 5000 records in one Solr
> call as
> >     this Solr instance is shared among other applications as well.
>
> If you can only pull 5000 at a time and you need to retrieve 100000
> records, how are you doing the pagination?  Using the start and rows
> parameter will be slow once you get a few pages in.
>
> Thanks,
> Shawn
>
>

-- 
http://www.needhamsoftware.com (work)
https://a.co/d/b2sZLD9 (my fantasy fiction book)

Reply via email to