Hi all. I'm reporting a very slow response from my solr cluster (two identical nodes, FreeBSD 12.3) with the query /admin/info/system. This query is used by solr UI web interface for home page rendering: The browser receive solr home page but can't render the solr cluster page ( the one with collection list ...) Both production nodes are affected by this problem, but statistically one of them is more affected than the other.
One other cluster (test environment, 2 solr nodes on a single server, with less memory and less java heap memory for each instance) is not affected. Production and test environment have very similar configurations, same solr.xml (default), almost same solrconfig.xml for every collection, same number of collections. The response for affected nodes varies from 6.5 seconds to 10.5/11 seconds, with peaks on 30/40 s. I think that 10 seconds is the timeout set for web UI rendering and that's why the rendering fails. 022-05-17 15:01:06.075 INFO (qtp1434234664-23) [ ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/system params={wt=json&_=1652799655508} status=0 QTime=10481 2022-05-17 15:01:06.162 INFO (qtp1434234664-17) [ ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/system params={wt=json&_=1652799655508} status=0 QTime=10568 2022-05-17 15:01:25.275 INFO (qtp1434234664-18) [ ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/system params={wt=json&_=1652799655508} status=0 QTime=10393 2022-05-17 15:01:48.251 INFO (qtp1434234664-18) [ ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/system params={wt=json&_=1652799697798} status=0 QTime=10358 2022-05-17 15:01:58.308 INFO (qtp1434234664-23) [ ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/system params={wt=json&_=1652799697798} status=0 QTime=10411 2022-05-17 15:03:12.211 INFO (qtp1434234664-17) [ ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/system params={wt=json&_=1652799785402} status=0 QTime=6712 <-- solr process after stop/start cycle 2022-05-17 15:03:15.988 INFO (qtp1434234664-22) [ ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/system params={wt=json&_=1652799785402} status=0 QTime=3769 2022-05-17 15:03:48.958 INFO (qtp1434234664-23) [ ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/system params={wt=json&_=1652799785402} status=0 QTime=6717 2022-05-17 15:04:12.216 INFO (qtp1434234664-23) [ ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/system params={wt=json&_=1652799845440} status=0 QTime=6688 2022-05-17 15:04:18.906 INFO (qtp1434234664-19) [ ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/system params={wt=json&_=1652799845440} status=0 QTime=6683 2022-05-17 15:04:39.960 INFO (qtp1434234664-16) [ ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/system params={wt=json&_=1652799873194} status=0 QTime=6681 The test environment response is around 0.5-1.5 seconds. 2022-05-17 15:09:44.046 INFO (qtp1434234664-20) [ ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/system params={wt=json&_=1652800171739} status=0 QTime=543 2022-05-17 15:11:19.662 INFO (qtp1434234664-85) [ ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/system params={wt=json&_=1652800171739} status=0 QTime=707 2022-05-17 15:11:19.663 INFO (qtp1434234664-81) [ ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/system params={wt=json&_=1652800171739} status=0 QTime=691 2022-05-17 15:11:40.720 INFO (qtp1434234664-17) [ ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/system params={wt=json&_=1652800298908} status=0 QTime=1728 2022-05-17 15:11:40.721 INFO (qtp1434234664-14) [ ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/system params={wt=json&_=1652800298908} status=0 QTime=1729 2022-05-18 06:33:14.874 INFO (qtp1434234664-17) [ ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/system params={wt=json&_=1652855590602} status=0 QTime=685 2022-05-18 06:33:14.890 INFO (qtp1434234664-85) [ ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/system params={wt=json&_=1652855590602} status=0 QTime=700 During the query I can see that process cpu goes to the top (100-120%) for 6-7 seconds. This peak is not seen in the test environment. Any idea ? All the server are not loaded: we will go in production in the next few month. It should be possible to debug this specific query ? It could be a zookeeper bottleneck ? Last but not least, normal query response are very satisfactory Best regards, Paolo Tealdi Ing. Paolo Tealdi Area IT - Politecnico Torino Telefono/Phone : +39-011-0906714 , FAX : +39-011-0906625 Indirizzo/Address : C.so Duca degli Abruzzi, 24 - 10129 Torino - ITALY Skype : tealdi.paolo Please consider your environmental responsibility before printing this e-mail