It happens because you use *-z zk-url *to connect to solr. When you do that the prometheus-export assumes that it connects to a SolrCloud environment and will collect the metrics from all nodes. Given you have started 3 prometheus-exporters, each one of them will collect all metrics from the cluster.
You can fix this in two different ways: 1- use *-h <your-local-solr-url>* instead of *-z <zk-url>* 2- have only one instance of the prometheus-exporter in the cluster Note that solution 1 will not retrieve the metrics you have configured in the *<collections>* tag in your configuration, as *-h* assumes a non-solr cloud instance. Regards, Mathieu On Wed, Aug 11, 2021 at 9:32 AM Joshua Hendrickson < jhendrick...@tripadvisor.com> wrote: > Hello, > > Our organization has implemented Solr 8.9.0 for a production use case. We > have standardized on Prometheus for metrics collection and storage. We > export metrics from our Solr cluster by deploying the public Solr image for > version 8.9.0 to an EC2 instance and using Docker to run the exporter > binary against Solr (which is running in a container on the same host). Our > Prometheus scraper (hosted in Kubernetes and configured via a Helm chart) > reports errors like the following on every scrape: > > ts=2021-08-10T16:44:13.929Z caller=dedupe.go:112 component=remote > level=error remote_name=11d3d0 url=https://our.endpoint/push > msg="non-recoverable error" count=500 err="server returned HTTP status 400 > Bad Request: user=nnnnn: err: duplicate sample for timestamp. > timestamp=2021-08-10T16:44:13.317Z, > series={__name__=\"solr_metrics_core_time_seconds_total\", > aws_account=\"our-account\", base_url=\" > http://fqdn.for.solr.server:32080/solr\", category=\"QUERY\", > cluster=\"our-cluster\", collection=\"a-collection\", > core=\"a_collection_shard1_replica_t13\", dc=\"aws\", handler=\"/select\", > instance=\" fqdn.for.solr.server:8984\", job=\"solr\", > replica=\"replica_t13\", shard=\"shard1\"}" > > We have confirmed that there are indeed duplicate time series when we > query our promtheus exporter. Here is a sample that shows the duplicate > time series: > > > solr_metrics_core_time_seconds_total{category="QUERY",handler="/select",core="a_collection_shard1_replica_t1",collection="a_collection",shard="shard1",replica="replica_t1",base_url=" > http://fqdn3.for.solr.server:32080/solr",} 1.533471301599E9 > > solr_metrics_core_time_seconds_total{category="QUERY",handler="/select",core="a_collection_shard1_replica_t1",collection="a_collection",shard="shard1",replica="replica_t1",base_url=" > http://fqdn3.for.solr.server:32080/solr",} 8.89078653472891E11 > > solr_metrics_core_time_seconds_total{category="QUERY",handler="/select",core="a_collection_shard1_replica_t1",collection="a_collection",shard="shard1",replica="replica_t1",base_url=" > http://fqdn3.for.solr.server:32080/solr",} 8.9061212477449E11 > > solr_metrics_core_time_seconds_total{category="QUERY",handler="/select",core="a_collection_shard1_replica_t3",collection="a_collection",shard="shard1",replica="replica_t3",base_url=" > http://fqdn2.for.solr.server:32080/solr",} 1.63796914645E9 > > solr_metrics_core_time_seconds_total{category="QUERY",handler="/select",core="a_collection_shard1_replica_t3",collection="a_collection",shard="shard1",replica="replica_t3",base_url=" > http://fqdn2.for.solr.server:32080/solr",} 9.05314998357273E11 > > solr_metrics_core_time_seconds_total{category="QUERY",handler="/select",core="a_collection_shard1_replica_t3",collection="a_collection",shard="shard1",replica="replica_t3",base_url=" > http://fqdn2.for.solr.server:32080/solr",} 9.06952967503723E11 > > solr_metrics_core_time_seconds_total{category="QUERY",handler="/select",core="a_collection_shard1_replica_t5",collection="a_collection",shard="shard1",replica="replica_t5",base_url=" > http://fqdn1.for.solr.server:32080/solr",} 1.667842814432E9 > > solr_metrics_core_time_seconds_total{category="QUERY",handler="/select",core="a_collection_shard1_replica_t5",collection="a_collection",shard="shard1",replica="replica_t5",base_url=" > http://fqdn1.for.solr.server:32080/solr",} 9.1289401347629E11 > > solr_metrics_core_time_seconds_total{category="QUERY",handler="/select",core="a_collection_shard1_replica_t5",collection="a_collection",shard="shard1",replica="replica_t5",base_url=" > http://fqdn1.for.solr.server:32080/solr",} 9.14561856290722E11 > > This is the systemd unit file that runs the exporter container: > > [Unit] > Description=Solr Exporter Docker > After=network.target > Wants=network.target > Requires=docker.service > After=docker.service > > [Service] > Type=simple > ExecStart=/usr/bin/docker run --rm \ > --name=solr-exporter \ > --net=host \ > --user=solr \ > solr:8.9.0 \ > /opt/solr/contrib/prometheus-exporter/bin/solr-exporter \ > -p 8984 -z the-various-zookeeper-endpoints -f > /opt/solr/contrib/prometheus-exporter/conf/solr-exporter-config.xml -n 4 > > ExecStop=/usr/bin/docker stop -t 2 solr-exporter > Restart=on-failure > > [Install] > WantedBy=multi-user.target > > I looked into the XML configurations for prometheus-exporter between 8.6.2 > (the previous version we used) and latest, and it looks like at some point > recently there was a major refactoring in how this works. Is there > something we are missing? Can anyone reproduce this issue on 8.9? > > Thanks in advance, > Joshua Hendrickson > > -- Mathieu Marie Software Engineer | Salesforce Mobile: + 33 6 98 59 62 31