> I have a historical question. Why do we write and maintain our own > code to generate the metrics response instead of using the prometheus > client library?
Old codes, I think at that time, prometheus is not popular yet >> I have learned that the /metrics endpoint will be requested by more than >> one metrics collect system. > > In practice, when does this happen? In cloud, maybe cloud services providers monitors the cluster, and users also monitors it. >> PrometheusMetricsGenerator#generate will be invoked once in a period(such >> as 1 minute), the result will be cached and returned for every metrics >> collect request in the period directly. > > Since there are tradeoffs to the cache duration, we should make the > period configurable. Yes, of course > 2022年2月25日 下午12:41,Michael Marshall <mmarsh...@apache.org> 写道: > > I have a historical question. Why do we write and maintain our own > code to generate the metrics response instead of using the prometheus > client library? > >> I have learned that the /metrics endpoint will be requested by more than >> one metrics collect system. > > In practice, when does this happen? > >> PrometheusMetricsGenerator#generate will be invoked once in a period(such >> as 1 minute), the result will be cached and returned for every metrics >> collect request in the period directly. > > Since there are tradeoffs to the cache duration, we should make the > period configurable. > > Thanks, > Michael > > On Wed, Feb 23, 2022 at 11:06 AM Jiuming Tao > <jm...@streamnative.io.invalid> wrote: >> >> Hi all, >>> >>> 2. When there are hundreds MB metrics data collected, it causes high heap >>> memory usage, high CPU usage and GC pressure. In the >>> `PrometheusMetricsGenerator#generate` method, it uses >>> `ByteBufAllocator.DEFAULT.heapBuffer()` to allocate memory for writing >>> metrics data. The default size of `ByteBufAllocator.DEFAULT.heapBuffer()` >>> is 256 bytes, when the buffer resizes, the new buffer capacity is 512 >>> bytes(power of 2) and with `mem_copy` operation. >>> If I want to write 100 MB data to the buffer, the current buffer size is >>> 128 MB, and the total memory usage is close to 256 MB (256bytes + 512 bytes >>> + 1k + .... + 64MB + 128MB). When the buffer size is greater than netty >>> buffer chunkSize(16 MB), it will be allocated as UnpooledHeapByteBuf in the >>> heap. After writing metrics data into the buffer, return it to the client >>> by jetty, jetty will copy it into jetty's buffer with memory allocation in >>> the heap, again! >>> In this condition, for the purpose of saving memory, avoid high CPU >>> usage(too much memory allocations and `mem_copy` operations) and reducing >>> GC pressure, I want to change `ByteBufAllocator.DEFAULT.heapBuffer()` to >>> `ByteBufAllocator.DEFAULT.compositeDirectBuffer()`, it wouldn't cause >>> `mem_copy` operations and huge memory allocations(CompositeDirectByteBuf is >>> a bit slowly in read/write, but it's worth). After writing data, I will >>> call the `HttpOutput#write(ByteBuffer)` method and write it to the client, >>> the method won't cause `mem_copy` (I have to wrap ByteBuf to ByteBuffer, if >>> ByteBuf wrapped, there will be zero-copy). >> >> The jdk in my local is jdk15, I just noticed that in jdk8, ByteBuffer cannot >> be extended and implemented. So, if allowed, I will write metrics data to >> temp files and send it to client by jetty’s send_file. It will be turned out >> a better performance than `CompositeByteBuf`, and takes lower CPU usage due >> to I/O blocking.(The /metrics endpoint will be a bit slowly, I believe it’s >> worth). >> If not allowed, it’s no matter and it also has a better performance than >> `ByteBufAllocator.DEFAULT.heapBuffer()`(see the first image in original >> mail). >> >> Thanks, >> Tao Jiuming