On 18.05.22 01:22, nina guo wrote: > Hello everyone, > > 1. I'm running load testing: > > > - case 1 - reboot Prometheus server before starting a new round test, > then general memory usage is around 45%*32GB for 1700000 series. > - case 2 - without rebooting, direct run the next round test, then > memory usage is around 65%*32GB for the same amount of series. I > understand > if without rebooting, in memory there will be data for last round, but the > data will be flushed to disk after 2-3 hours(?), then memory consumption > should be reduced to 45%, but why this hadn't happened?
I assume you are talking about RAM? RAM usage in general, and RAM usage by the Go runtime in particular, is hard to vet. There might be RAM that is essentially freed up, but the Go runtime hasn't returned it to the OS yet (but maybe would do so if the OS asks for it). On top of that, Prometheus uses mmap'ing quite extensively, which is memory usage accounted for in a different way and showing up in other metrics. It needs a real expert in memory management to understand all of that properly (and I'm not one of them). > - case 3 - at 10:28AM prometheus_tsdb_head_series is 1680000, but after > one hour, the series was almost doubled to 3100000, and memory consumption > increased to 85%*32. Why the time series was doubled? Checked one more > hour, prometheus_tsdb_head_series was back to 1680000. May I ask what > happened probably during this 1 hour? Are you talking about head series? If they double, that's usually a sign of a global label change that has happened, creating every series as a new series, so you have twice as many until the old series are not part of the head anymore (which can take up to a couple of hours). > *2. *Local disk space which attach to Prometheus server, retention period > is 1h.(at 3:35, the test started.) > 3:35 used 655.2M > 4:35 used 1.2G > 5:38 used 1.5G > 6:38 used 2.4G > 7:38 used 1.7G > 8:38 used 2.6G > 9:38 used 1.7G > 10:38 used 2.7G > 11:38 used 1.7G > >From the above record, I have several questions: > > - I scrapped 1,700,000 metrics every 60s, but why the data increased > differently, 3:35-4:35, 4:35-5:38, 5:38-6:38, 7:38-8:38,9:38-10:38. > - I tried to use this formula to estimate the disk space first > > "needed_disk_space = retention_time_seconds * ingested_samples_per_second * > bytes_per_sample" -> 60*60*(1700000/60)*2 = 201,600,000bytes > This is different with my testing record, from my testing record, the > increased is between 400M - 1G. > Why the result is not consistent between the expect and actual result? > > - I understand the data reduced due to the 1h retention period, am I > right? The smallest block size is two hours, which is the smallest granularity Prometheus can delete data. If you have a retention of only 1h, you will get a clearly noticeable sawtooth pattern. Essentially, you will store more and more data until you have a history of 3h, which is then cut down to 1h once a full 2h block can be deleted without losing the 1h retention you have configured. Rinse and repeat. On top of that, there is the WAL, which gets truncated when data is written to a proper block on disk. This is also happening in that 2h cadence, so you get a lot of up and down in relative terms. A more common retention is something like two weeks, at which points all those 2h sawtooth patterns don't matter that much anymore in relative terms. -- Björn Rabenstein [PGP-ID] 0x851C3DA17D748D03 [email] [email protected] -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/Yo5aI57Y4OXZt5U/%40mail.rabenste.in.

