400 collections is very large. Metrics are collected for every core in a bunch of different categories. So 400 collections with X shards and replicated Y ways I would imagine a huge response and aggregation of metrics is probably leading to your timeout.
The metrics endpoint does not call every node. If you want a full view of metrics of your cluster, you would need to call every node and aggregate those metrics yourself which is what the Prometheus exporter does for you. Maybe try using the query parameters that allow you to filter and reduce your response size for only the metrics you need. From: users@solr.apache.org At: 12/12/24 07:26:56 UTC-5:00To: users@solr.apache.org Subject: SolrCloud /admin/metrics endpoint timeouts Hi everyone, When we try to export metrics from a large solr cloud installation of a dozen VMs with around 400 collections, our curl calls to /solr/admin/metrics endpoint times out. With smaller clusters, the response is immediate and we get all the details right away. What do you think is the problem? Where should we look for debugging? I couldn't see anything in the logs. Maybe I am looking at the wrong stuff. How does metric collection work? When I perform a call, does the master node call every other node one by one? Which port? Do you recommend another way to collect the metrics? We are using Solr 9.2 Best regards yunus