First of all Thanks to all who have replied to this question.

Just to make things clear my use case is not a typical one i.e. i am not going to show first 50 or 100 result.

My use case is to create a CSV file (matrix kind of) depending on what user filters from the web application and the resulting set can range from hundreds to millions documents.

Firing SolrRequest again and again and asking for results (may be 10-100 at a time) from web application will increase the amount of time untill the CSV file is done.

So i just want to know from your experience what is the optimal maximum value of the document that i can request from SOLR in one go so that the number of request from web application to SOLR is minimal.

Documents mainly consists of string fields upto 255 characters.

I am trying out with different values for the rows parameter in the request at my end but just want to heard from the SOLR community about the advantages and disadvantages of doing the same or better way of doing this as i am totally new to SOLR.

Also i will be using SOLRJ so pointers towards that side will be more helpful.


Regards

Neha Gupt


On 28/04/2022 16:53, Christopher Schultz wrote:
Neha,

On 4/27/22 16:35, Neha Gupta wrote:
I have different cores with different number of documents.

1) Core 1: - 227625 docs and each document having approx 10 String fields.

2) Core 2: - Approx 3.5 million documents and each having 3 string fields.

We still have no idea about the size of the documents you are talking about. Your "3 string fields" could still be gigabytes of data per document. But maybe you meant "short string fields between 0 and 255 characters" or something like that. But if that's what you meant, you should have said that.

So my question is if i request in one request lets say approximate 10K documents using SOLRJ will that be OK.

Solr will be fine. Will your application be able to handle that much data?

By safe here i mean approx. maximum number of documents that i can
request without causing any problem in receiving a response from
SOLR.
This depends entirely upon your application. If you request 10k documents, and each document requires 1MiB of memory, and you store every document from Solr in memory in your application, then you will require 10 GiB of heap space just for that one response. If you have multiple threads making those kinds of requests to Solr all at the same time, you will need 10GiB * N threads of heap space.

Is that enough to answer the question?

Is anyone going to ever look at 10k worth of documents at a time? That seems like quite a lot.

Maybe your use-case isn't a typical "search for products in a sales catalog and show them 50-at-a-time to a web user".

Knowing what your use-case is would be very helpful to answer the question "is this a good idea?"

-chris

On 27/04/2022 22:26, Andy Lester wrote:

On Apr 27, 2022, at 3:23 PM, Neha Gupta<neha.gu...@uni-jena.de>  wrote:

Just for information I will be firing queries from Java application to SOLR using SOLRJ and would like to know how much maximum documents (i.e  maximum number of rows that i can request in the query) can be returned safely from SOLR.
It’s impossible to answer that. First, how do you mean “safe”? How big are your documents?

Let’s turn it around. Do you have a number in mind where you’re wondering if Solr can handle it? Like you’re thinking “Can Solr handle 10 million documents averaging 10K each”? That’s much easier to address.

Andy

Reply via email to