Hey Brian On Tue, 2023-02-28 at 00:27 -0800, Brian Candler wrote: > > I can offer a couple more options: > > (1) Use two servers with federation. > - server 1 does the scraping and keeps the detailed data for 2 weeks > - server 2 scrapes server 1 at lower interval, using the federation > endpoint
I had thought about that as well. Though it feels a bit "ugly". > (2) Use recording rules to generate lower-resolution copies of the > primary timeseries - but then you'd still have to remote-write them > to a second server to get the longer retention, since this can't be > set at timeseries level. I had (very briefly) read about the recording rules (merely just that they exist ^^) ... but wouldn't these give me a new name for the metric? If so, I'd need to adapt e.g. https://grafana.com/grafana/dashboards/1860-node-exporter-full/ to use the metrics generated by the recording rules,... which again seems quite some maintenance effort. Plus, as you even wrote below, I'd need users to use different dashboards, AFAIU, one where the detailed data is used, one where the downsampled data is used. Sure that would work as a workaround, but is of course not really a good solution, as one would rather want to "seamlessly" move from the detailed to less-detailed data. > Either case makes the querying more awkward. If you don't want > separate dashboards for near-term and long-term data, then it might > work to stick promxy in front of them. Which would however make the setup more complex again. > Apart from saving disk space (and disks are really, really cheap > these days), I suspect the main benefit you're looking for is to get > faster queries when running over long time periods. Indeed, I > believe Thanos creates downsampled timeseries for exactly this > reason, whilst still continuing to retain all the full-resolution > data as well. I guess I may have too look into that, how complex it's setup would be. > That depends. What PromQL query does your graph use? How many > timeseries does it touch? What's your scrape interval? So far I've just been playing with the ones from: https://grafana.com/grafana/dashboards/1860-node-exporter-full/ So all queries in that and all time series that uses. Interval is 15s. > Is your VM backed by SSDs? I think it's a Ceph cluster what the super computing centre uses for that, but I have no idea what that runs upon. Probably HDDs. > Another suggestion: running netdata within the VM will give you > performance metrics at 1 second intervals, which can help identify > what's happening during those 10-15 seconds: e.g. are you > bottlenecked on CPU, or disk I/O, or something else. Good idea, thanks. Thanks, Chris. -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/e35d617dbaab44de43da049414103ff1e9102e61.camel%40gmail.com.

