What happens on 9.x?? :)
> On Nov 25, 2022, at 11:33 AM, Richard Goodman <richa...@brandwatch.com> wrote: > > Hi there, > > We have a cluster spread over 72 instances on k8s hosting around 12.5 > billion documents (made up of 30 collections, each collection having 12 > shards). We were originally using 7.7.2 and performance was okay enough for > us for our business needs. We then recently upgraded our cluster to > v8.11.2, and have noticed a drop in performance. I appreciate that there > have been a lot of changes from 7.7.2 to 8.11.2, but I have been collecting > metrics, and although the configuration (instance type and resource > allocation, start up opts) are the same, we are completely at a loss as to > why it's performing worse, and was wondering if anyone had any guidance? > > I recently stumbled across the tickets; > > - SOLR-15840 <https://issues.apache.org/jira/browse/SOLR-15840> - > Performance degradation with http2 > - SOLR-16099 <https://issues.apache.org/jira/browse/SOLR-16099> - HTTP > Client threads can hang > > In particular which sparked interest, and so we spun up a parallel cluster > with -Dsolr.http1=true, and there was no difference in performance. We're > testing a couple of other ideas, such as different DirectoryFatory *(as I > saw a message from someone in the Solr Slack about there being an issue > with the MMap directory and vm.max_map_count)*, some GC settings, but are > really open to any suggestions. We're also happy if it'll help with any > performance related topics to use this cluster to test patches at a large > scale to see if it'll help with performance *(more specifically to the two > Solr tickets listed above)*. > > I thought it would be useful to show some metrics I collected where we had > 2 clusters spun up, 1 being 7.7.2 and 1 being 8.11.2 where the 8.11.2 > cluster was the active, and all traffic was being shadow loaded into the > 7.7.2 cluster to compare against. It's important to note that both clusters > had the same configuration, here is a list to name a few: > > - G1GC garbage collector > - TLOG replication > - 27Gi Memory per instance > - 16Gi assigned to -XmX and -Xms > - 16 cores > - -XX:G1HeapRegionSize=4m > - -XX:G1ReservePercent=20 > - -XX:InitiatingHeapOccupancyPercent=35 > > One metric that did stand out, was that 8.11.2 was churning through *a lot* of > eden space in the heap, which can be seen in some of the screenshots of > metrics below; > > Total Memory Usage: > 7.7.2 > > > 8.11.2 > > > Total Used G1 Pools > 7.7.2 > > > 8.11.2 > > > And finally, the overall thread pool > 7.7.2 > > > 8.11.2 > > > Any guidance or requests to test for performance wise would be appreciated. > > Thanks, > > Richard