Hi,

I have a situation where I have a few "local" Prometheus servers sending 
data to a "global" server using the remote write API. I get errors that 
look like this on the remote write receiver:

ts=2022-02-03T12:41:11.244Z caller=write_handler.go:57 level=error 
component=web msg="Out of order sample from remote write" err="duplicate 
sample for timestamp"

The senders get the same error from the receiver, with a 400 HTML code.

After much trial and error I figured out that it happens because I have the 
same recording rules on all servers, on both senders and receiver. 
recording-rules.yaml looks like this:
```
groups:
  - name: node-exporter
    rules:
      # CPU cores per node
      - record: instance:node_cpus:count
        expr: count(node_cpu_seconds_total{mode="idle"}) without (cpu,mode)

      # CPU in use by CPU
      - record: instance_cpu:node_cpu_seconds_not_idle:rate5m
        expr: sum(rate(node_cpu_seconds_total{mode!="idle"}[5m])) without 
(mode)
```

However, if I delete the second rule, the errors are gone. So if I change 
recording-rules.yaml on all servers to:
```
groups:
  - name: node-exporter
    rules:
      # CPU cores per node
      - record: instance:node_cpus:count
        expr: count(node_cpu_seconds_total{mode="idle"}) without (cpu,mode)
```

Why?

1. Why are there duplicates in the first case, does the remote write 
receiver also run the rules when it receives data?
2. Why aren't there errors any more when the only rule is the CPU count? 
Shouldn't there be duplicates in that case too?

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/ff37682b-cc2d-46b4-9010-c7617d41b068n%40googlegroups.com.

Reply via email to