[ 
https://issues.apache.org/jira/browse/SOLR-10654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17848396#comment-17848396
 ] 

Matthew Biscocho commented on SOLR-10654:
-----------------------------------------

The overhead post processing due to JQ is not the main problem but certainly is 
one. I would say, running the Prometheus exporter can be costly, especially at 
scale and running multiple instances. It offers the flexibility for 
configurability but I don't think that solves everyones use-case as it is not 
free to just run the Prometheus Exporter. I think the overhead of aggregating 
metrics should happen on the Grafana or Prometheus level while the exposed 
metrics themselves should just be raw values. With this PR, prometheus can just 
scrape and then the aggregation can be done on Grafana directly and skips the 
extra http call hops from the prometheus exporter and JQ processing.

I took a bit of time to measure some performance between my PR and the 
prometheus exporter. I created a cloud with 2 nodes and 50 collections to get a 
bunch of metrics. For the cloud, I curl'd each node individually and captured 
the response time of each node. Not sure if prometheus scrapes sequentially or 
in parallel but looks like both just take around ~0.6s locally.

I modified the Prometheus exporter config to only scrape the same metrics my PR 
currently exports (Core registry) and added a few lines of code to capture the 
timing it takes for scraping and JQ processing. Looking at the timing it was 
taking around 4-5 seconds per collection interval which is significantly longer.

`My PR:`

`curl -o /dev/null -s -w 'Total: %\{time_total}s\n' 
'localhost:8983/solr/admin/metrics?wt=prometheus'`
`Total: 0.614125s`
`curl -o /dev/null -s -w 'Total: %\{time_total}s\n' 
'localhost:7574/solr/admin/metrics?wt=prometheus'`
`Total: 0.597078s`

 

`Prometheus Exporter:`

INFO  - 2024-05-21 18:10:28.930; 
org.apache.solr.prometheus.collector.SchedulerMetricsCollector; Completed 
metrics collection
INFO  - 2024-05-21 18:11:28.923; 
org.apache.solr.prometheus.collector.SchedulerMetricsCollector; Beginning 
metrics collection
PT4.355627S

I want to say this is due to the Http calls and JQ processing the Prometheus 
Exporter needs to do while my PR is doing a straight internal conversion. 
Although it is doing the conversion per call, it doesn't seem to be as costly 
as the prometheus exporter is.

> Expose Metrics in Prometheus format DIRECTLY from Solr
> ------------------------------------------------------
>
>                 Key: SOLR-10654
>                 URL: https://issues.apache.org/jira/browse/SOLR-10654
>             Project: Solr
>          Issue Type: Improvement
>          Components: metrics
>            Reporter: Keith Laban
>            Priority: Major
>         Attachments: prometheus_metrics.txt
>
>          Time Spent: 3h
>  Remaining Estimate: 0h
>
> Expose metrics via a `wt=prometheus` response type.
> Example scape_config in prometheus.yml:
> {code}
> scrape_configs:
>   - job_name: 'solr'
>     metrics_path: '/solr/admin/metrics'
>     params:
>       wt: ["prometheus"]
>     static_configs:
>       - targets: ['localhost:8983']
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Reply via email to