400 collections is very large. Metrics are collected for every core in a bunch 
of different categories. So 400 collections with X shards and replicated Y ways 
I would imagine a huge response and aggregation of metrics is probably leading 
to your timeout.

The metrics endpoint does not call every node. If you want a full view of 
metrics of your cluster, you would need to call every node and aggregate those 
metrics yourself which is what the Prometheus exporter does for you.

Maybe try using the query parameters that allow you to filter and reduce your 
response size for only the metrics you need.

From: users@solr.apache.org At: 12/12/24 07:26:56 UTC-5:00To:  
users@solr.apache.org
Subject: SolrCloud /admin/metrics endpoint timeouts

Hi everyone,

When we try to export metrics from a large solr cloud installation of a
dozen VMs with around 400 collections,  our curl calls to
/solr/admin/metrics endpoint times out.
With smaller clusters, the response is immediate and we get all the details
right away.

What do you think is the problem?
Where should we look for debugging? I couldn't see anything in the logs.
Maybe I am looking at the wrong stuff.
How does metric collection work? When I perform a call, does the master
node call every other node one by one? Which port?
Do you recommend another way to collect the metrics?

We are using Solr 9.2

Best regards
yunus


Reply via email to