Hello, Our organization has implemented Solr 8.9.0 for a production use case. We have standardized on Prometheus for metrics collection and storage. We export metrics from our Solr cluster by deploying the public Solr image for version 8.9.0 to an EC2 instance and using Docker to run the exporter binary against Solr (which is running in a container on the same host). Our Prometheus scraper (hosted in Kubernetes and configured via a Helm chart) reports errors like the following on every scrape:
ts=2021-08-10T16:44:13.929Z caller=dedupe.go:112 component=remote level=error remote_name=11d3d0 url=https://our.endpoint/push msg="non-recoverable error" count=500 err="server returned HTTP status 400 Bad Request: user=nnnnn: err: duplicate sample for timestamp. timestamp=2021-08-10T16:44:13.317Z, series={__name__=\"solr_metrics_core_time_seconds_total\", aws_account=\"our-account\", base_url=\"http://fqdn.for.solr.server:32080/solr\", category=\"QUERY\", cluster=\"our-cluster\", collection=\"a-collection\", core=\"a_collection_shard1_replica_t13\", dc=\"aws\", handler=\"/select\", instance=\" fqdn.for.solr.server:8984\", job=\"solr\", replica=\"replica_t13\", shard=\"shard1\"}" We have confirmed that there are indeed duplicate time series when we query our promtheus exporter. Here is a sample that shows the duplicate time series: solr_metrics_core_time_seconds_total{category="QUERY",handler="/select",core="a_collection_shard1_replica_t1",collection="a_collection",shard="shard1",replica="replica_t1",base_url="http://fqdn3.for.solr.server:32080/solr",} 1.533471301599E9 solr_metrics_core_time_seconds_total{category="QUERY",handler="/select",core="a_collection_shard1_replica_t1",collection="a_collection",shard="shard1",replica="replica_t1",base_url="http://fqdn3.for.solr.server:32080/solr",} 8.89078653472891E11 solr_metrics_core_time_seconds_total{category="QUERY",handler="/select",core="a_collection_shard1_replica_t1",collection="a_collection",shard="shard1",replica="replica_t1",base_url="http://fqdn3.for.solr.server:32080/solr",} 8.9061212477449E11 solr_metrics_core_time_seconds_total{category="QUERY",handler="/select",core="a_collection_shard1_replica_t3",collection="a_collection",shard="shard1",replica="replica_t3",base_url="http://fqdn2.for.solr.server:32080/solr",} 1.63796914645E9 solr_metrics_core_time_seconds_total{category="QUERY",handler="/select",core="a_collection_shard1_replica_t3",collection="a_collection",shard="shard1",replica="replica_t3",base_url="http://fqdn2.for.solr.server:32080/solr",} 9.05314998357273E11 solr_metrics_core_time_seconds_total{category="QUERY",handler="/select",core="a_collection_shard1_replica_t3",collection="a_collection",shard="shard1",replica="replica_t3",base_url="http://fqdn2.for.solr.server:32080/solr",} 9.06952967503723E11 solr_metrics_core_time_seconds_total{category="QUERY",handler="/select",core="a_collection_shard1_replica_t5",collection="a_collection",shard="shard1",replica="replica_t5",base_url="http://fqdn1.for.solr.server:32080/solr",} 1.667842814432E9 solr_metrics_core_time_seconds_total{category="QUERY",handler="/select",core="a_collection_shard1_replica_t5",collection="a_collection",shard="shard1",replica="replica_t5",base_url="http://fqdn1.for.solr.server:32080/solr",} 9.1289401347629E11 solr_metrics_core_time_seconds_total{category="QUERY",handler="/select",core="a_collection_shard1_replica_t5",collection="a_collection",shard="shard1",replica="replica_t5",base_url="http://fqdn1.for.solr.server:32080/solr",} 9.14561856290722E11 This is the systemd unit file that runs the exporter container: [Unit] Description=Solr Exporter Docker After=network.target Wants=network.target Requires=docker.service After=docker.service [Service] Type=simple ExecStart=/usr/bin/docker run --rm \ --name=solr-exporter \ --net=host \ --user=solr \ solr:8.9.0 \ /opt/solr/contrib/prometheus-exporter/bin/solr-exporter \ -p 8984 -z the-various-zookeeper-endpoints -f /opt/solr/contrib/prometheus-exporter/conf/solr-exporter-config.xml -n 4 ExecStop=/usr/bin/docker stop -t 2 solr-exporter Restart=on-failure [Install] WantedBy=multi-user.target I looked into the XML configurations for prometheus-exporter between 8.6.2 (the previous version we used) and latest, and it looks like at some point recently there was a major refactoring in how this works. Is there something we are missing? Can anyone reproduce this issue on 8.9? Thanks in advance, Joshua Hendrickson