Hi Charlie, Gah, thanks for informing me of that, here is a link to the images is here <https://imgur.com/a/yEmBGuv>
Cheers, On Tue, 29 Nov 2022 at 13:23, Charlie Hull <ch...@opensourceconnections.com> wrote: > Hey Richard, > > Attachments are stripped by this list so you might want to upload them > somewhere and link to them. > > Cheers > > Charlie > > On 25/11/2022 17:33, Richard Goodman wrote: > > Hi there, > > > > We have a cluster spread over 72 instances on k8s hosting around 12.5 > > billion documents (made up of 30 collections, each collection having 12 > > shards). We were originally using 7.7.2 and performance was okay enough > for > > us for our business needs. We then recently upgraded our cluster to > > v8.11.2, and have noticed a drop in performance. I appreciate that there > > have been a lot of changes from 7.7.2 to 8.11.2, but I have been > collecting > > metrics, and although the configuration (instance type and resource > > allocation, start up opts) are the same, we are completely at a loss as > to > > why it's performing worse, and was wondering if anyone had any guidance? > > > > I recently stumbled across the tickets; > > > > - SOLR-15840<https://issues.apache.org/jira/browse/SOLR-15840> - > > Performance degradation with http2 > > - SOLR-16099<https://issues.apache.org/jira/browse/SOLR-16099> - > HTTP > > Client threads can hang > > > > In particular which sparked interest, and so we spun up a parallel > cluster > > with -Dsolr.http1=true, and there was no difference in performance. We're > > testing a couple of other ideas, such as different DirectoryFatory *(as I > > saw a message from someone in the Solr Slack about there being an issue > > with the MMap directory and vm.max_map_count)*, some GC settings, but are > > really open to any suggestions. We're also happy if it'll help with any > > performance related topics to use this cluster to test patches at a large > > scale to see if it'll help with performance *(more specifically to the > two > > Solr tickets listed above)*. > > > > I thought it would be useful to show some metrics I collected where we > had > > 2 clusters spun up, 1 being 7.7.2 and 1 being 8.11.2 where the 8.11.2 > > cluster was the active, and all traffic was being shadow loaded into the > > 7.7.2 cluster to compare against. It's important to note that both > clusters > > had the same configuration, here is a list to name a few: > > > > - G1GC garbage collector > > - TLOG replication > > - 27Gi Memory per instance > > - 16Gi assigned to -XmX and -Xms > > - 16 cores > > - -XX:G1HeapRegionSize=4m > > - -XX:G1ReservePercent=20 > > - -XX:InitiatingHeapOccupancyPercent=35 > > > > One metric that did stand out, was that 8.11.2 was churning through *a > lot* of > > eden space in the heap, which can be seen in some of the screenshots of > > metrics below; > > > > Total Memory Usage: > > 7.7.2 > > > > > > 8.11.2 > > > > > > Total Used G1 Pools > > 7.7.2 > > > > > > 8.11.2 > > > > > > And finally, the overall thread pool > > 7.7.2 > > > > > > 8.11.2 > > > > > > Any guidance or requests to test for performance wise would be > appreciated. > > > > Thanks, > > > > Richard > > > -- > Charlie Hull - Managing Consultant at OpenSource Connections Limited > Founding member of The Search Network <http://www.thesearchnetwork.com> > and co-author of Searching the Enterprise > < > https://opensourceconnections.com/wp-content/uploads/2020/08/ES_book_final_journal_version.pdf > > > tel/fax: +44 (0)8700 118334 > mobile: +44 (0)7767 825828 > > OpenSource Connections Europe GmbH | Pappelallee 78/79 | 10437 Berlin > Amtsgericht Charlottenburg | HRB 230712 B > Geschäftsführer: John M. Woodell | David E. Pugh > Finanzamt: Berlin Finanzamt für Körperschaften II -- Richard Goodman (he/him) | Senior Data Infrastructure engineer richa...@brandwatch.com NEW YORK | BOSTON | CHICAGO | TORONTO | *BRIGHTON* | LONDON | COPENHAGEN | BERLIN | STUTTGART | FRANKFURT | PARIS | BUDAPEST | SOFIA | CHENNAI | SINGAPORE | SYDNEY | MELBOURNE