Neha,
On 4/28/22 16:54, Neha Gupta wrote:
Just to make things clear my use case is not a typical one i.e. i am not
going to show first 50 or 100 result.
My use case is to create a CSV file (matrix kind of) depending on what
user filters from the web application and the resulting set can range
from hundreds to millions documents.
Firing SolrRequest again and again and asking for results (may be 10-100
at a time) from web application will increase the amount of time untill
the CSV file is done.
So i just want to know from your experience what is the optimal maximum
value of the document that i can request from SOLR in one go so that the
number of request from web application to SOLR is minimal.
If you use cursors in Solr, I'm not sure it really matters too much how
many documents you request at once. Honestly, your application is likely
to be the bottleneck.
But I'm assuming that you are going to stream-to-disk or at least
stream-to-client so maybe that doesn't matter, either.
Documents mainly consists of string fields upto 255 characters.
This doesn't really matter much, as long as you are streaming everything
and not buffering.
I am trying out with different values for the rows parameter in the
request at my end but just want to heard from the SOLR community about
the advantages and disadvantages of doing the same or better way of
doing this as i am totally new to SOLR.
Also i will be using SOLRJ so pointers towards that side will be more
helpful.
You could also probably just ... try it. There are no benchmarks better
than ones against your own environment.
-chris
On 28/04/2022 16:53, Christopher Schultz wrote:
Neha,
On 4/27/22 16:35, Neha Gupta wrote:
I have different cores with different number of documents.
1) Core 1: - 227625 docs and each document having approx 10 String
fields.
2) Core 2: - Approx 3.5 million documents and each having 3 string
fields.
We still have no idea about the size of the documents you are talking
about. Your "3 string fields" could still be gigabytes of data per
document. But maybe you meant "short string fields between 0 and 255
characters" or something like that. But if that's what you meant, you
should have said that.
So my question is if i request in one request lets say approximate
10K documents using SOLRJ will that be OK.
Solr will be fine. Will your application be able to handle that much
data?
By safe here i mean approx. maximum number of documents that i can
request without causing any problem in receiving a response from
SOLR.
This depends entirely upon your application. If you request 10k
documents, and each document requires 1MiB of memory, and you store
every document from Solr in memory in your application, then you will
require 10 GiB of heap space just for that one response. If you have
multiple threads making those kinds of requests to Solr all at the
same time, you will need 10GiB * N threads of heap space.
Is that enough to answer the question?
Is anyone going to ever look at 10k worth of documents at a time? That
seems like quite a lot.
Maybe your use-case isn't a typical "search for products in a sales
catalog and show them 50-at-a-time to a web user".
Knowing what your use-case is would be very helpful to answer the
question "is this a good idea?"
-chris
On 27/04/2022 22:26, Andy Lester wrote:
On Apr 27, 2022, at 3:23 PM, Neha Gupta<neha.gu...@uni-jena.de> wrote:
Just for information I will be firing queries from Java application
to SOLR using SOLRJ and would like to know how much maximum
documents (i.e maximum number of rows that i can request in the
query) can be returned safely from SOLR.
It’s impossible to answer that. First, how do you mean “safe”? How
big are your documents?
Let’s turn it around. Do you have a number in mind where you’re
wondering if Solr can handle it? Like you’re thinking “Can Solr
handle 10 million documents averaging 10K each”? That’s much easier
to address.
Andy