[ https://issues.apache.org/jira/browse/SOLR-13234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17881123#comment-17881123 ]
David Smiley commented on SOLR-13234: ------------------------------------- Years later, observing this PR creates an Http SolrClient per node. This isn't necessary; one can be used for all the nodes. > Prometheus Metric Exporter Not Threadsafe > ----------------------------------------- > > Key: SOLR-13234 > URL: https://issues.apache.org/jira/browse/SOLR-13234 > Project: Solr > Issue Type: Bug > Components: contrib - prometheus-exporter, metrics > Affects Versions: 7.6, 8.0 > Reporter: Danyal Prout > Assignee: Shalin Shekhar Mangar > Priority: Minor > Labels: metric-collector > Fix For: 7.7.2, 8.1, 9.0 > > Attachments: SOLR-13234-branch_7x.patch > > Time Spent: 40m > Remaining Estimate: 0h > > The Solr Prometheus Exporter collects metrics when it receives a HTTP request > from Prometheus. Prometheus sends this request, on its [scrape > interval|https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config]. > When the time taken to collect the Solr metrics is greater than the scrape > interval of the Prometheus server, this results in concurrent metric > collection occurring in this > [method|https://github.com/apache/lucene-solr/blob/master/solr/contrib/prometheus-exporter/src/java/org/apache/solr/prometheus/collector/SolrCollector.java#L86]. > This method doesn’t appear to be thread safe, for instance you could have > concurrent modifications of a > [map|https://github.com/apache/lucene-solr/blob/master/solr/contrib/prometheus-exporter/src/java/org/apache/solr/prometheus/collector/SolrCollector.java#L119]. > After a while the Solr Exporter processes becomes nondeterministic, we've > observed NPE and loss of metrics. > To address this, I'm proposing the following fixes: > 1. Read/parse the configuration at startup and make it immutable. > 2. Collect metrics from Solr on an interval which is controlled by the Solr > Exporter and cache the metric samples to return during Prometheus scraping. > Metric collection can be expensive, for example executing arbitrary Solr > searches, it's not ideal to allow for concurrent metric collection and on an > interval which is not defined by the Solr Exporter. > There are also a few other performance improvements that we've made while > fixing this, for example using the ClusterStateProvider instead of sending > multiple HTTP requests to each Solr node to lookup all the cores. > I'm currently finishing up these changes which I'll submit as a PR. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org