Hi all.

I'm reporting a very slow response from my solr cluster (two identical nodes, 
FreeBSD 12.3) with the query  /admin/info/system.
This query is used by solr UI web interface for home page rendering: The 
browser receive solr home page but can't render the  solr cluster page ( the 
one with collection list ...)
Both production nodes are affected by this problem, but statistically one of 
them is more affected than the other.

One other cluster (test environment, 2 solr nodes on a single server, with less 
memory and less java heap memory for each instance) is not affected. Production 
and test environment have very similar configurations, same solr.xml (default), 
almost same solrconfig.xml for every collection, same number of collections.

The response for affected nodes varies from 6.5 seconds to 10.5/11 seconds, 
with peaks on 30/40 s. I think that 10 seconds is the timeout set for web UI 
rendering and that's why the rendering fails.

022-05-17 15:01:06.075 INFO  (qtp1434234664-23) [   ] o.a.s.s.HttpSolrCall 
[admin] webapp=null path=/admin/info/system params={wt=json&_=1652799655508} 
status=0 QTime=10481
2022-05-17 15:01:06.162 INFO  (qtp1434234664-17) [   ] o.a.s.s.HttpSolrCall 
[admin] webapp=null path=/admin/info/system params={wt=json&_=1652799655508} 
status=0 QTime=10568
2022-05-17 15:01:25.275 INFO  (qtp1434234664-18) [   ] o.a.s.s.HttpSolrCall 
[admin] webapp=null path=/admin/info/system params={wt=json&_=1652799655508} 
status=0 QTime=10393
2022-05-17 15:01:48.251 INFO  (qtp1434234664-18) [   ] o.a.s.s.HttpSolrCall 
[admin] webapp=null path=/admin/info/system params={wt=json&_=1652799697798} 
status=0 QTime=10358
2022-05-17 15:01:58.308 INFO  (qtp1434234664-23) [   ] o.a.s.s.HttpSolrCall 
[admin] webapp=null path=/admin/info/system params={wt=json&_=1652799697798} 
status=0 QTime=10411
2022-05-17 15:03:12.211 INFO  (qtp1434234664-17) [   ] o.a.s.s.HttpSolrCall 
[admin] webapp=null path=/admin/info/system params={wt=json&_=1652799785402} 
status=0 QTime=6712       <-- solr process after stop/start cycle
2022-05-17 15:03:15.988 INFO  (qtp1434234664-22) [   ] o.a.s.s.HttpSolrCall 
[admin] webapp=null path=/admin/info/system params={wt=json&_=1652799785402} 
status=0 QTime=3769
2022-05-17 15:03:48.958 INFO  (qtp1434234664-23) [   ] o.a.s.s.HttpSolrCall 
[admin] webapp=null path=/admin/info/system params={wt=json&_=1652799785402} 
status=0 QTime=6717
2022-05-17 15:04:12.216 INFO  (qtp1434234664-23) [   ] o.a.s.s.HttpSolrCall 
[admin] webapp=null path=/admin/info/system params={wt=json&_=1652799845440} 
status=0 QTime=6688
2022-05-17 15:04:18.906 INFO  (qtp1434234664-19) [   ] o.a.s.s.HttpSolrCall 
[admin] webapp=null path=/admin/info/system params={wt=json&_=1652799845440} 
status=0 QTime=6683
2022-05-17 15:04:39.960 INFO  (qtp1434234664-16) [   ] o.a.s.s.HttpSolrCall 
[admin] webapp=null path=/admin/info/system params={wt=json&_=1652799873194} 
status=0 QTime=6681

The test environment response is around 0.5-1.5 seconds.
2022-05-17 15:09:44.046 INFO  (qtp1434234664-20) [   ] o.a.s.s.HttpSolrCall 
[admin] webapp=null path=/admin/info/system params={wt=json&_=1652800171739} 
status=0 QTime=543
2022-05-17 15:11:19.662 INFO  (qtp1434234664-85) [   ] o.a.s.s.HttpSolrCall 
[admin] webapp=null path=/admin/info/system params={wt=json&_=1652800171739} 
status=0 QTime=707
2022-05-17 15:11:19.663 INFO  (qtp1434234664-81) [   ] o.a.s.s.HttpSolrCall 
[admin] webapp=null path=/admin/info/system params={wt=json&_=1652800171739} 
status=0 QTime=691
2022-05-17 15:11:40.720 INFO  (qtp1434234664-17) [   ] o.a.s.s.HttpSolrCall 
[admin] webapp=null path=/admin/info/system params={wt=json&_=1652800298908} 
status=0 QTime=1728
2022-05-17 15:11:40.721 INFO  (qtp1434234664-14) [   ] o.a.s.s.HttpSolrCall 
[admin] webapp=null path=/admin/info/system params={wt=json&_=1652800298908} 
status=0 QTime=1729
2022-05-18 06:33:14.874 INFO  (qtp1434234664-17) [   ] o.a.s.s.HttpSolrCall 
[admin] webapp=null path=/admin/info/system params={wt=json&_=1652855590602} 
status=0 QTime=685
2022-05-18 06:33:14.890 INFO  (qtp1434234664-85) [   ] o.a.s.s.HttpSolrCall 
[admin] webapp=null path=/admin/info/system params={wt=json&_=1652855590602} 
status=0 QTime=700


During the query I can see that  process cpu goes to the top (100-120%) for 6-7 
seconds. This peak is not seen in the test environment.


Any idea ? All the server are not loaded: we will go in production in the next 
few month.
It should be possible to debug this specific query ?
It could be a zookeeper bottleneck ?
Last but not least, normal query response are very satisfactory


Best regards,
Paolo Tealdi

Ing. Paolo Tealdi                                                               
                                                              Area IT - 
Politecnico Torino
Telefono/Phone : +39-011-0906714 , FAX : +39-011-0906625
Indirizzo/Address : C.so Duca degli Abruzzi,  24 - 10129 Torino - ITALY         
            Skype : tealdi.paolo
Please consider your environmental responsibility before printing this e-mail

Reply via email to