Hello,

Our organization has implemented Solr 8.9.0 for a production use case. We have 
standardized on Prometheus for metrics collection and storage. We export 
metrics from our Solr cluster by deploying the public Solr image for version 
8.9.0 to an EC2 instance and using Docker to run the exporter binary against 
Solr (which is running in a container on the same host). Our Prometheus scraper 
(hosted in Kubernetes and configured via a Helm chart) reports errors like the 
following on every scrape:

ts=2021-08-10T16:44:13.929Z caller=dedupe.go:112 component=remote level=error 
remote_name=11d3d0 url=https://our.endpoint/push msg="non-recoverable error" 
count=500 err="server returned HTTP status 400 Bad Request: user=nnnnn: err: 
duplicate sample for timestamp. timestamp=2021-08-10T16:44:13.317Z, 
series={__name__=\"solr_metrics_core_time_seconds_total\", 
aws_account=\"our-account\", 
base_url=\"http://fqdn.for.solr.server:32080/solr\";, category=\"QUERY\", 
cluster=\"our-cluster\", collection=\"a-collection\", 
core=\"a_collection_shard1_replica_t13\", dc=\"aws\", handler=\"/select\", 
instance=\" fqdn.for.solr.server:8984\", job=\"solr\", replica=\"replica_t13\", 
shard=\"shard1\"}"

We have confirmed that there are indeed duplicate time series when we query our 
promtheus exporter. Here is a sample that shows the duplicate time series:

solr_metrics_core_time_seconds_total{category="QUERY",handler="/select",core="a_collection_shard1_replica_t1",collection="a_collection",shard="shard1",replica="replica_t1",base_url="http://fqdn3.for.solr.server:32080/solr",}
 1.533471301599E9
solr_metrics_core_time_seconds_total{category="QUERY",handler="/select",core="a_collection_shard1_replica_t1",collection="a_collection",shard="shard1",replica="replica_t1",base_url="http://fqdn3.for.solr.server:32080/solr",}
 8.89078653472891E11
solr_metrics_core_time_seconds_total{category="QUERY",handler="/select",core="a_collection_shard1_replica_t1",collection="a_collection",shard="shard1",replica="replica_t1",base_url="http://fqdn3.for.solr.server:32080/solr",}
 8.9061212477449E11
solr_metrics_core_time_seconds_total{category="QUERY",handler="/select",core="a_collection_shard1_replica_t3",collection="a_collection",shard="shard1",replica="replica_t3",base_url="http://fqdn2.for.solr.server:32080/solr",}
 1.63796914645E9
solr_metrics_core_time_seconds_total{category="QUERY",handler="/select",core="a_collection_shard1_replica_t3",collection="a_collection",shard="shard1",replica="replica_t3",base_url="http://fqdn2.for.solr.server:32080/solr",}
 9.05314998357273E11
solr_metrics_core_time_seconds_total{category="QUERY",handler="/select",core="a_collection_shard1_replica_t3",collection="a_collection",shard="shard1",replica="replica_t3",base_url="http://fqdn2.for.solr.server:32080/solr",}
 9.06952967503723E11
solr_metrics_core_time_seconds_total{category="QUERY",handler="/select",core="a_collection_shard1_replica_t5",collection="a_collection",shard="shard1",replica="replica_t5",base_url="http://fqdn1.for.solr.server:32080/solr",}
 1.667842814432E9
solr_metrics_core_time_seconds_total{category="QUERY",handler="/select",core="a_collection_shard1_replica_t5",collection="a_collection",shard="shard1",replica="replica_t5",base_url="http://fqdn1.for.solr.server:32080/solr",}
 9.1289401347629E11
solr_metrics_core_time_seconds_total{category="QUERY",handler="/select",core="a_collection_shard1_replica_t5",collection="a_collection",shard="shard1",replica="replica_t5",base_url="http://fqdn1.for.solr.server:32080/solr",}
 9.14561856290722E11

This is the systemd unit file that runs the exporter container:

[Unit]
Description=Solr Exporter Docker
After=network.target
Wants=network.target
Requires=docker.service
After=docker.service

[Service]
Type=simple
ExecStart=/usr/bin/docker run --rm \
--name=solr-exporter \
--net=host \
--user=solr \
solr:8.9.0 \
/opt/solr/contrib/prometheus-exporter/bin/solr-exporter \
-p 8984 -z the-various-zookeeper-endpoints -f 
/opt/solr/contrib/prometheus-exporter/conf/solr-exporter-config.xml -n 4

ExecStop=/usr/bin/docker stop -t 2 solr-exporter
Restart=on-failure

[Install]
WantedBy=multi-user.target

I looked into the XML configurations for prometheus-exporter between 8.6.2 (the 
previous version we used) and latest, and it looks like at some point recently 
there was a major refactoring in how this works. Is there something we are 
missing? Can anyone reproduce this issue on 8.9?

Thanks in advance,
Joshua Hendrickson 

Reply via email to