[ https://issues.apache.org/jira/browse/SOLR-16531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17631201#comment-17631201 ]
Ishan Chattopadhyaya commented on SOLR-16531: --------------------------------------------- bq. So the screenshot and associated webpage show that the red line jumps from a little over 325-ish on the prior commit (3ceae7 - "Pin OS of docker image...") to a little over 350-ish with the JAX-RS commit. (Do you have access to the specific numbers there, Ishan?). So the delta of whatever this test is doing is ~25s. Yes, roughly. Attached the raw results above. bq. The performance test involves two tasks. According to the cluster-test.json file linked above, the first task involves collection creation (solely), and the second task is a restart of each node after all the collection creation is done. Correct. bq. Again, going from cluster-test.json, it looks like task 1 creates 1000 collections, but doesn't specify how many shards or replicas each collection has? Does that mean 1s, 1r, or are there other defaults? There's a file that contains all the collections: https://github.com/fullstorydev/solr-bench/blob/ishan/repeatable-jenkins/suites/cluster-test.json#L5 bq. Going from cluster-test.json, the cluster either has 8 or 7 nodes (not sure how to understand/reconcile the properties here and here.). There are 8 nodes, but restarting just 7 of them (to avoid restarting the overseer node). bq. Assuming 8 nodes total going forward, each node would host roughly 1000 * replicasPerShard * shardsPerCollection / 8 cores. Or 125 * replicasPerShard * shardsPerCollection. I remember there being around 750 cores/node in this test. It can be calculated from that cluster state file I linked to above. bq. Now, "task 2" itself involves restarting these loaded nodes 2 at a time and waiting for everything to be healthy between batches of restarts. If each node is restarted once, that means "task 2" would kick off restarts 4 times (again, doing 2 in parallel each time). Right. bq. So the "before" performance of ~325s translates to a restart of a node with 125 * replicasPerShard * shardsPerCollection total cores taking about 81s... I'm not sure I follow. Here's how I think about this. 750 replicas/node, total time 325s, so approx around 81s to restart a node with ~750 replicas (assuming one more node is being restarted at the same time). And the "after" performance of ~350s translates to a similar restart now taking about 87s Yes. So, ultimately, this perf test is telling us that JAX-RS makes restarts of heavily loaded nodes take ~7-8% longer i.e. (87.5 - 81.25)/81.25 Assuming the 325 and 350 as correct, this seems right to me. > Performance degradation due to introduction of JAX-RS > ----------------------------------------------------- > > Key: SOLR-16531 > URL: https://issues.apache.org/jira/browse/SOLR-16531 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Reporter: Ishan Chattopadhyaya > Priority: Blocker > Fix For: 9.2 > > Attachments: Screenshot from 2022-11-09 11-20-44.png, > results-with-patch.tar.gz > > > During performance benchmarking on branch_9x, I observed a slowdown in > restart performance since commits in SOLR-16347. See attached screenshot. > CC [~gerlowskija]. > http://mostly.cool/cluster-test-with-patch.html > The benchmark is here: > https://github.com/fullstorydev/solr-bench/blob/ishan/repeatable-jenkins/suites/cluster-test.json. > This suite was run after retro-actively applying the parallelStream patch > from SOLR-16414: > https://github.com/apache/solr/commit/b33161d0cdd976fc0c3dc78c4afafceb4db671cf.diff > > Effort to automate these benchmarks is WIP and tracked here: SOLR-16525. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org