On 7/5/2022 3:11 PM, Christopher Schultz wrote:
Well, if you need more than 32GiB, I think the recommendation is to go MUCH HIGHER than 32GiB. If you have a 48GiB machine, maybe restrict to 31GiB of heap, but if you have a TiB, go for it :)
I remember reading somewhere, likely for a different program than Solr, that the observed break-even point for 64-bit pointers was 46GB. The level of debugging and introspection required to calculate that number would be VERY extensive. Most Solr installs can get by with a max heap size of 31GB or less, even if they are quite large. For those that need more, I would probably want to see a heap size of at least 64GB. It is probably better to use SolrCloud and split the index across more servers to keep the heap requirement low than to use a really massive heap.
This is why I said "uhh..." above: the JVM needs more memory than the heap. Sometimes as much as twice that amount, depending upon the workload of the application itself. Measure, measure, measure.
It would be interesting to see how much overhead there really is for Solr with various index sizes. We have seen people have OOM problems when making *only* GC changes ... switching from CMS to G1. Solr has used G1 out of the box for a while now.
I'm in interested to know what the relation is between on-disk index side and in-memory index size. I would imagine that the on-disk artifacts are fairly slim (only storing what is necessary) and the in-memory representation has all kinds of "waste" (like pointers and all that). Has anyone done a back-of-the-napkin calculation to guess at the in-memory size of an index given the on-disk representation?
That is an interesting question. One of the reasons Lucene queries so fast when there is plenty of memory is because it accesses files on disk directly with MMAP, so there is no need to copy the really massive data structures into the heap at all.
I believe the OP is having problems because they need a total memory size far larger than 64GB to handle 500GB of index data, and they should also have dedicated hardware for Solr so there is no competition with other software for scarce system resources.
Thanks, Shawn