[ https://issues.apache.org/jira/browse/SOLR-10654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17848396#comment-17848396 ]
Matthew Biscocho commented on SOLR-10654: ----------------------------------------- The overhead post processing due to JQ is not the main problem but certainly is one. I would say, running the Prometheus exporter can be costly, especially at scale and running multiple instances. It offers the flexibility for configurability but I don't think that solves everyones use-case as it is not free to just run the Prometheus Exporter. I think the overhead of aggregating metrics should happen on the Grafana or Prometheus level while the exposed metrics themselves should just be raw values. With this PR, prometheus can just scrape and then the aggregation can be done on Grafana directly and skips the extra http call hops from the prometheus exporter and JQ processing. I took a bit of time to measure some performance between my PR and the prometheus exporter. I created a cloud with 2 nodes and 50 collections to get a bunch of metrics. For the cloud, I curl'd each node individually and captured the response time of each node. Not sure if prometheus scrapes sequentially or in parallel but looks like both just take around ~0.6s locally. I modified the Prometheus exporter config to only scrape the same metrics my PR currently exports (Core registry) and added a few lines of code to capture the timing it takes for scraping and JQ processing. Looking at the timing it was taking around 4-5 seconds per collection interval which is significantly longer. `My PR:` `curl -o /dev/null -s -w 'Total: %\{time_total}s\n' 'localhost:8983/solr/admin/metrics?wt=prometheus'` `Total: 0.614125s` `curl -o /dev/null -s -w 'Total: %\{time_total}s\n' 'localhost:7574/solr/admin/metrics?wt=prometheus'` `Total: 0.597078s` `Prometheus Exporter:` INFO - 2024-05-21 18:10:28.930; org.apache.solr.prometheus.collector.SchedulerMetricsCollector; Completed metrics collection INFO - 2024-05-21 18:11:28.923; org.apache.solr.prometheus.collector.SchedulerMetricsCollector; Beginning metrics collection PT4.355627S I want to say this is due to the Http calls and JQ processing the Prometheus Exporter needs to do while my PR is doing a straight internal conversion. Although it is doing the conversion per call, it doesn't seem to be as costly as the prometheus exporter is. > Expose Metrics in Prometheus format DIRECTLY from Solr > ------------------------------------------------------ > > Key: SOLR-10654 > URL: https://issues.apache.org/jira/browse/SOLR-10654 > Project: Solr > Issue Type: Improvement > Components: metrics > Reporter: Keith Laban > Priority: Major > Attachments: prometheus_metrics.txt > > Time Spent: 3h > Remaining Estimate: 0h > > Expose metrics via a `wt=prometheus` response type. > Example scape_config in prometheus.yml: > {code} > scrape_configs: > - job_name: 'solr' > metrics_path: '/solr/admin/metrics' > params: > wt: ["prometheus"] > static_configs: > - targets: ['localhost:8983'] > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org