On 10/6/22 15:54, Dominique Bejean wrote:
We are starting to investigate performance issues with a new customer.
There are several bad practices (commit, sharding, replicas count and
types, heap size, ...), that can explain these issues and we will work on
it in the next few days. I agree we need to better understand specific
usage and make some tests after fixing the bad practices.

Anyway, one of the specific aspects is these huge servers, so I am trying
to see what is the best way to use all these ressources.


* Why do you want to split it up at all?

Because one of the bad practices is a huge heap size (80 Gb). I am pretty
sure this heap size is not required and anyway it doesn't respect the 31Gb
limit. After determining the best heap size, if this size is near 31Gb, I
imagine it is better to have several Solr JVMs with less heap size. For
instance 2 Solr JVMs with 20 Gb each or 4 Solr JVMs with 10 Gb each.

IMHO, lowering the heap size requirement is the ONLY reason to run more than one Solr instance on a server.  But I would go with 2 JVMs at 20GB each rather than 4 at 11GB.  Reduce the number of moving parts that you must track and manage.  The minimum heap requirement of two JVMs each with half the data will be a little bit larger than the minimum requirement of one JVM with all the data, due to JVM overhead.  I don't have a number for you on how much overhead there is for each JVM.

* MMapDirectory JVM sharing

This point is the main reason for my message. If several Solr JVMs are
running on one server, will MMapDirectory work fine or will the JVMs fight
with each other in order to use off heap memory ?

There would be little or no difference in the competition for disk cache memory with one JVM or several.

Storage configuration is the second point that I would like to investigate
in order to better share disk resources.
Instead have one single RAID 6 volume, isn't it better to have one distinct
not RAID volume per Solr node (if multiple Solr nodes are running on the
server) or multiple not RAID volumes use by a single Solr JVM (if only one
Solr node is running on the server) ?

It is true that if you have each instance running on its own disk that what a single instance does will have zero effect on another instance.

But RAID10 can mean even better performance than one mirror set for each instance.  Here's some "back of the envelope" calculations for you.  Let's assume that each of the drives has a sustained throughput of 125 megabytes per second.  Most modern SATA disks can exceed that, and high-RPM enterprise SAS disks are faster.  SSD beats them all by a wide margin.

If you move to an 8-drive RAID10 array with slower disks like I just described, then the array has access to a potential data write rate of 500 MB/s, as the array consists of four mirror sets with the volume striped across them.  A well-designed RAID controller can potentially have an even higher read rate than 500 MB/s, by taking advantage of the fact that every bit of data actually exists on two drives, not just one.  The highest possible data rates will not always happen, but the average throughput will very often exceed the single disk rate of 125 MB/s.

There is another important consideration that applies no matter how the storage is arranged:  If the machine has sufficient spare memory, most of the data that Lucene needs will be sitting in the disk cache at all times and transfer at incredible speed, and the amount of data that is actually read from disk will be relatively small.  This is the secret to stellar Solr performance:  Lots of memory beyond what is needed by program heaps, so that disk accesses are not needed very often.

Thanks,
Shawn

Reply via email to