On 11/17/21 7:05 AM, Derek C wrote:
Hi all,

I'm trying to get Near Real Time searching working with SOLR (so that
documents I insert, or documents I update, are visible in a SOLR query as
quickly as possible).
<snip>
I have about 2.2 million documents in a SOLR core (quite a lot of fields
too - maybe 40 and a lot are indexed=true as well).  I'm using
ClassicIndexSchemaFactory rather than ManagedIndexSchemaFactory.

Right now I'm running on a single VM with 16Gbytes of memory and 12GB given
to SOLR (displayed as JVM-Memory on the Dashboard page - right now it's
saying 74.3% / 8.92GB of 12.00GB in use).
<snip>
<autoCommit>

       <maxDocs>100000000</maxDocs>
       <maxTime>86400000</maxTime>
       <openSearcher>false</openSearcher>
     </autoCommit>

     <autoSoftCommit>
       <maxTime>1000</maxTime>
     </autoSoftCommit>


The first thing I would change is autoCommit.  Go with something like this:

<autoCommit>
  <maxTime>60000</maxTime>
  <openSearcher>false</openSearcher>
</autoCommit>

A value of 24 hours or 100 million documents might as well not be configured at all.

One second is FAR too aggressive for autoSoftCommit.  Unless your index is super tiny, which is not a description I would apply to 2.2 million documents, a timeframe that low will tend to CAUSE problems.  For it to take 10 minutes is extremely odd, and probably indicates that there is a very large performance problem with your setup.

You did not indicate what version of Solr you have, or how large that index is on disk.

Can you gather a screenshot from the server, put it on a file sharing site, and provide a URL for it?  Sending it as an email attachment is unlikely to succeed.  This wiki page describes what I am looking for:

https://cwiki.apache.org/confluence/display/SOLR/SolrPerformanceProblems#SolrPerformanceProblems-Askingforhelponamemory/performanceissue

It would also be useful for us to have solr.log and solr_gc.log covering the time period from when you index a change to when the document becomes visible.  The whole unedited file, not an excerpt.

Dropbox and gist are two good choices for sharing files.  There are many others.

The fact that you have Solr in a VM means that if there are any performance issues relating to the VM host, they could be translating into problems for Solr.  The possibilities for problems at the VM host level are numerous.

Thanks,
Shawn


Reply via email to