Mathieu, We have changed our Prometheus configuration to scrape only from one pod in the cluster, but we still see the error given below. Is there anything else we can try?
On 2021/08/11 08:58:34, Mathieu Marie <m...@salesforce.com.INVALID> wrote: > It happens because you use *-z zk-url *to connect to solr.> > When you do that the prometheus-export assumes that it connects to a> > SolrCloud environment and will collect the metrics from all nodes.> > Given you have started 3 prometheus-exporters, each one of them will> > collect all metrics from the cluster.> > > You can fix this in two different ways:> > 1- use *-h <your-local-solr-url>* instead of *-z <zk-url>*> > 2- have only one instance of the prometheus-exporter in the cluster> > > Note that solution 1 will not retrieve the metrics you have configured in> > the *<collections>* tag in your configuration, as *-h* assumes a non-solr> > cloud instance.> > > Regards,> > Mathieu> > > On Wed, Aug 11, 2021 at 9:32 AM Joshua Hendrickson <> > jhendrick...@tripadvisor.com> wrote:> > > > Hello,> > >> > > Our organization has implemented Solr 8.9.0 for a production use case. We> > > have standardized on Prometheus for metrics collection and storage. We> > > export metrics from our Solr cluster by deploying the public Solr image > > for> > > version 8.9.0 to an EC2 instance and using Docker to run the exporter> > > binary against Solr (which is running in a container on the same host). > > Our> > > Prometheus scraper (hosted in Kubernetes and configured via a Helm chart)> > > reports errors like the following on every scrape:> > >> > > ts=2021-08-10T16:44:13.929Z caller=dedupe.go:112 component=remote> > > level=error remote_name=11d3d0 url=https://our.endpoint/push> > > msg="non-recoverable error" count=500 err="server returned HTTP status 400> > > Bad Request: user=nnnnn: err: duplicate sample for timestamp.> > > timestamp=2021-08-10T16:44:13.317Z,> > > series={__name__=\"solr_metrics_core_time_seconds_total\",> > > aws_account=\"our-account\", base_url=\"> > > http://fqdn.for.solr.server:32080/solr\", category=\"QUERY\",> > > cluster=\"our-cluster\", collection=\"a-collection\",> > > core=\"a_collection_shard1_replica_t13\", dc=\"aws\", handler=\"/select\",> > > instance=\" fqdn.for.solr.server:8984\", job=\"solr\",> > > replica=\"replica_t13\", shard=\"shard1\"}"> > >> > > We have confirmed that there are indeed duplicate time series when we> > > query our promtheus exporter. Here is a sample that shows the duplicate> > > time series:> > >> > >> > > solr_metrics_core_time_seconds_total{category="QUERY",handler="/select",core="a_collection_shard1_replica_t1",collection="a_collection",shard="shard1",replica="replica_t1",base_url="> > > > > http://fqdn3.for.solr.server:32080/solr",} 1.533471301599E9> > >> > > solr_metrics_core_time_seconds_total{category="QUERY",handler="/select",core="a_collection_shard1_replica_t1",collection="a_collection",shard="shard1",replica="replica_t1",base_url="> > > > > http://fqdn3.for.solr.server:32080/solr",} 8.89078653472891E11> > >> > > solr_metrics_core_time_seconds_total{category="QUERY",handler="/select",core="a_collection_shard1_replica_t1",collection="a_collection",shard="shard1",replica="replica_t1",base_url="> > > > > http://fqdn3.for.solr.server:32080/solr",} 8.9061212477449E11> > >> > > solr_metrics_core_time_seconds_total{category="QUERY",handler="/select",core="a_collection_shard1_replica_t3",collection="a_collection",shard="shard1",replica="replica_t3",base_url="> > > > > http://fqdn2.for.solr.server:32080/solr",} 1.63796914645E9> > >> > > solr_metrics_core_time_seconds_total{category="QUERY",handler="/select",core="a_collection_shard1_replica_t3",collection="a_collection",shard="shard1",replica="replica_t3",base_url="> > > > > http://fqdn2.for.solr.server:32080/solr",} 9.05314998357273E11> > >> > > solr_metrics_core_time_seconds_total{category="QUERY",handler="/select",core="a_collection_shard1_replica_t3",collection="a_collection",shard="shard1",replica="replica_t3",base_url="> > > > > http://fqdn2.for.solr.server:32080/solr",} 9.06952967503723E11> > >> > > solr_metrics_core_time_seconds_total{category="QUERY",handler="/select",core="a_collection_shard1_replica_t5",collection="a_collection",shard="shard1",replica="replica_t5",base_url="> > > > > http://fqdn1.for.solr.server:32080/solr",} 1.667842814432E9> > >> > > solr_metrics_core_time_seconds_total{category="QUERY",handler="/select",core="a_collection_shard1_replica_t5",collection="a_collection",shard="shard1",replica="replica_t5",base_url="> > > > > http://fqdn1.for.solr.server:32080/solr",} 9.1289401347629E11> > >> > > solr_metrics_core_time_seconds_total{category="QUERY",handler="/select",core="a_collection_shard1_replica_t5",collection="a_collection",shard="shard1",replica="replica_t5",base_url="> > > > > http://fqdn1.for.solr.server:32080/solr",} 9.14561856290722E11> > >> > > This is the systemd unit file that runs the exporter container:> > >> > > [Unit]> > > Description=Solr Exporter Docker> > > After=network.target> > > Wants=network.target> > > Requires=docker.service> > > After=docker.service> > >> > > [Service]> > > Type=simple> > > ExecStart=/usr/bin/docker run --rm \> > > --name=solr-exporter \> > > --net=host \> > > --user=solr \> > > solr:8.9.0 \> > > /opt/solr/contrib/prometheus-exporter/bin/solr-exporter \> > > -p 8984 -z the-various-zookeeper-endpoints -f> > > /opt/solr/contrib/prometheus-exporter/conf/solr-exporter-config.xml -n 4> > >> > > ExecStop=/usr/bin/docker stop -t 2 solr-exporter> > > Restart=on-failure> > >> > > [Install]> > > WantedBy=multi-user.target> > >> > > I looked into the XML configurations for prometheus-exporter between 8.6.2> > > (the previous version we used) and latest, and it looks like at some point> > > recently there was a major refactoring in how this works. Is there> > > something we are missing? Can anyone reproduce this issue on 8.9?> > >> > > Thanks in advance,> > > Joshua Hendrickson> > >> > >> > > -- > > Mathieu Marie> > Software Engineer | Salesforce> > Mobile: + 33 6 98 59 62 31> >