Hi all.
I'm reporting a very slow response from my solr cluster (two identical nodes,
FreeBSD 12.3) with the query /admin/info/system.
This query is used by solr UI web interface for home page rendering: The
browser receive solr home page but can't render the solr cluster page ( the
one with collection list ...)
Both production nodes are affected by this problem, but statistically one of
them is more affected than the other.
One other cluster (test environment, 2 solr nodes on a single server, with less
memory and less java heap memory for each instance) is not affected. Production
and test environment have very similar configurations, same solr.xml (default),
almost same solrconfig.xml for every collection, same number of collections.
The response for affected nodes varies from 6.5 seconds to 10.5/11 seconds,
with peaks on 30/40 s. I think that 10 seconds is the timeout set for web UI
rendering and that's why the rendering fails.
022-05-17 15:01:06.075 INFO (qtp1434234664-23) [ ] o.a.s.s.HttpSolrCall
[admin] webapp=null path=/admin/info/system params={wt=json&_=1652799655508}
status=0 QTime=10481
2022-05-17 15:01:06.162 INFO (qtp1434234664-17) [ ] o.a.s.s.HttpSolrCall
[admin] webapp=null path=/admin/info/system params={wt=json&_=1652799655508}
status=0 QTime=10568
2022-05-17 15:01:25.275 INFO (qtp1434234664-18) [ ] o.a.s.s.HttpSolrCall
[admin] webapp=null path=/admin/info/system params={wt=json&_=1652799655508}
status=0 QTime=10393
2022-05-17 15:01:48.251 INFO (qtp1434234664-18) [ ] o.a.s.s.HttpSolrCall
[admin] webapp=null path=/admin/info/system params={wt=json&_=1652799697798}
status=0 QTime=10358
2022-05-17 15:01:58.308 INFO (qtp1434234664-23) [ ] o.a.s.s.HttpSolrCall
[admin] webapp=null path=/admin/info/system params={wt=json&_=1652799697798}
status=0 QTime=10411
2022-05-17 15:03:12.211 INFO (qtp1434234664-17) [ ] o.a.s.s.HttpSolrCall
[admin] webapp=null path=/admin/info/system params={wt=json&_=1652799785402}
status=0 QTime=6712 <-- solr process after stop/start cycle
2022-05-17 15:03:15.988 INFO (qtp1434234664-22) [ ] o.a.s.s.HttpSolrCall
[admin] webapp=null path=/admin/info/system params={wt=json&_=1652799785402}
status=0 QTime=3769
2022-05-17 15:03:48.958 INFO (qtp1434234664-23) [ ] o.a.s.s.HttpSolrCall
[admin] webapp=null path=/admin/info/system params={wt=json&_=1652799785402}
status=0 QTime=6717
2022-05-17 15:04:12.216 INFO (qtp1434234664-23) [ ] o.a.s.s.HttpSolrCall
[admin] webapp=null path=/admin/info/system params={wt=json&_=1652799845440}
status=0 QTime=6688
2022-05-17 15:04:18.906 INFO (qtp1434234664-19) [ ] o.a.s.s.HttpSolrCall
[admin] webapp=null path=/admin/info/system params={wt=json&_=1652799845440}
status=0 QTime=6683
2022-05-17 15:04:39.960 INFO (qtp1434234664-16) [ ] o.a.s.s.HttpSolrCall
[admin] webapp=null path=/admin/info/system params={wt=json&_=1652799873194}
status=0 QTime=6681
The test environment response is around 0.5-1.5 seconds.
2022-05-17 15:09:44.046 INFO (qtp1434234664-20) [ ] o.a.s.s.HttpSolrCall
[admin] webapp=null path=/admin/info/system params={wt=json&_=1652800171739}
status=0 QTime=543
2022-05-17 15:11:19.662 INFO (qtp1434234664-85) [ ] o.a.s.s.HttpSolrCall
[admin] webapp=null path=/admin/info/system params={wt=json&_=1652800171739}
status=0 QTime=707
2022-05-17 15:11:19.663 INFO (qtp1434234664-81) [ ] o.a.s.s.HttpSolrCall
[admin] webapp=null path=/admin/info/system params={wt=json&_=1652800171739}
status=0 QTime=691
2022-05-17 15:11:40.720 INFO (qtp1434234664-17) [ ] o.a.s.s.HttpSolrCall
[admin] webapp=null path=/admin/info/system params={wt=json&_=1652800298908}
status=0 QTime=1728
2022-05-17 15:11:40.721 INFO (qtp1434234664-14) [ ] o.a.s.s.HttpSolrCall
[admin] webapp=null path=/admin/info/system params={wt=json&_=1652800298908}
status=0 QTime=1729
2022-05-18 06:33:14.874 INFO (qtp1434234664-17) [ ] o.a.s.s.HttpSolrCall
[admin] webapp=null path=/admin/info/system params={wt=json&_=1652855590602}
status=0 QTime=685
2022-05-18 06:33:14.890 INFO (qtp1434234664-85) [ ] o.a.s.s.HttpSolrCall
[admin] webapp=null path=/admin/info/system params={wt=json&_=1652855590602}
status=0 QTime=700
During the query I can see that process cpu goes to the top (100-120%) for 6-7
seconds. This peak is not seen in the test environment.
Any idea ? All the server are not loaded: we will go in production in the next
few month.
It should be possible to debug this specific query ?
It could be a zookeeper bottleneck ?
Last but not least, normal query response are very satisfactory
Best regards,
Paolo Tealdi
Ing. Paolo Tealdi
Area IT -
Politecnico Torino
Telefono/Phone : +39-011-0906714 , FAX : +39-011-0906625
Indirizzo/Address : C.so Duca degli Abruzzi, 24 - 10129 Torino - ITALY
Skype : tealdi.paolo
Please consider your environmental responsibility before printing this e-mail