Re: [DISCUSS] PrometheusMetricsServlet performance improvement

Jiuming Tao Thu, 24 Feb 2022 20:48:53 -0800

> I have a historical question. Why do we write and maintain our own
> code to generate the metrics response instead of using the prometheus
> client library?


Old codes, I think at that time, prometheus is not popular yet


>> I have learned that the /metrics endpoint will be requested by more than
>> one metrics collect system.
> 
> In practice, when does this happen?
In cloud, maybe cloud services providers monitors the cluster, and users also 
monitors it.

>> PrometheusMetricsGenerator#generate will be invoked once in a period(such
>> as 1 minute), the result will be cached and returned for every metrics
>> collect request in the period directly.
> 
> Since there are tradeoffs to the cache duration, we should make the
> period configurable.

Yes, of course

> 2022年2月25日 下午12:41，Michael Marshall <mmarsh...@apache.org> 写道：
> 
> I have a historical question. Why do we write and maintain our own
> code to generate the metrics response instead of using the prometheus
> client library?
> 
>> I have learned that the /metrics endpoint will be requested by more than
>> one metrics collect system.
> 
> In practice, when does this happen?
> 
>> PrometheusMetricsGenerator#generate will be invoked once in a period(such
>> as 1 minute), the result will be cached and returned for every metrics
>> collect request in the period directly.
> 
> Since there are tradeoffs to the cache duration, we should make the
> period configurable.
> 
> Thanks,
> Michael
> 
> On Wed, Feb 23, 2022 at 11:06 AM Jiuming Tao
> <jm...@streamnative.io.invalid> wrote:
>> 
>> Hi all,
>>> 
>>> 2. When there are hundreds MB metrics data collected, it causes high heap 
>>> memory usage, high CPU usage and GC pressure. In the 
>>> `PrometheusMetricsGenerator#generate` method, it uses 
>>> `ByteBufAllocator.DEFAULT.heapBuffer()` to allocate memory for writing 
>>> metrics data. The default size of `ByteBufAllocator.DEFAULT.heapBuffer()` 
>>> is 256 bytes, when the buffer resizes, the new buffer capacity is 512 
>>> bytes(power of 2) and with `mem_copy` operation.
>>> If I want to write 100 MB data to the buffer, the current buffer size is 
>>> 128 MB, and the total memory usage is close to 256 MB (256bytes + 512 bytes 
>>> + 1k + .... + 64MB + 128MB). When the buffer size is greater than netty 
>>> buffer chunkSize(16 MB), it will be allocated as UnpooledHeapByteBuf in the 
>>> heap. After writing metrics data into the buffer, return it to the client 
>>> by jetty, jetty will copy it into jetty's buffer with memory allocation in 
>>> the heap, again!
>>> In this condition, for the purpose of saving memory, avoid high CPU 
>>> usage(too much memory allocations and `mem_copy` operations) and reducing 
>>> GC pressure, I want to change `ByteBufAllocator.DEFAULT.heapBuffer()` to 
>>> `ByteBufAllocator.DEFAULT.compositeDirectBuffer()`, it wouldn't cause 
>>> `mem_copy` operations and huge memory allocations(CompositeDirectByteBuf is 
>>> a bit slowly in read/write, but it's worth). After writing data, I will 
>>> call the `HttpOutput#write(ByteBuffer)` method and write it to the client, 
>>> the method won't cause `mem_copy` (I have to wrap ByteBuf to ByteBuffer, if 
>>> ByteBuf wrapped, there will be zero-copy).
>> 
>> The jdk in my local is jdk15, I just noticed that in jdk8, ByteBuffer cannot 
>> be extended and implemented. So, if allowed, I will write metrics data to 
>> temp files and send it to client by jetty’s send_file. It will be turned out 
>> a better performance than `CompositeByteBuf`, and takes lower CPU usage due 
>> to I/O blocking.(The /metrics endpoint will be a bit slowly, I believe it’s 
>> worth).
>> If not allowed, it’s no matter and it also has a better performance than 
>> `ByteBufAllocator.DEFAULT.heapBuffer()`(see the first image in original 
>> mail).
>> 
>> Thanks,
>> Tao Jiuming

Re: [DISCUSS] PrometheusMetricsServlet performance improvement

Reply via email to