Avoiding use of the heap with avro messages in C++

Andrew Marlow Thu, 12 Jan 2023 04:52:51 -0800

Hello developers,

I am working on a C++17 project that uses avro and hence, avrogencpp. While
doing some soak testing I noticed that a long running process that
publishes lots of avro messages and that hangs on to them (trust me, it
needs to for the case I am working on) will consume ever more heap memory.
Eventually the process gets killed by the OOM killer. This is a shame
because I want the memory used for the messages to get drawn from a memory
pool that I create at the start. I do allocate from the pool for the avro
message itself, but some parts of what is inside the avro message are still
using the heap.


I have managed to make some progress with this. I've created a ticket,
https://issues.apache.org/jira/browse/AVRO-3705, so that the avrogencpp
from the RHEL repo can emit code that uses the C++17 std::any rather than
the boost::any. I have a PR for this which adds the --cpp17 command line
option to avrogencpp. This means that for small pieces of data in an avro
message, e.g. integers or booleans, the avro message will not allocate from
the heap. This is because the std::any has a SBO (small buffer
optimisation). Also it will not allocate from the heap for string data if
the string fits into the SSO (small string optimisation) buffer that we get
with the std::string from gcc.

However, strings longer than the SSO buffer size do get allocated from the
heap. I wonder if anyone can suggest what to do about this? I am relatively
new to kafka and avro so I am not sure if avro supports the idea of being
able to create low level custom types.  If it does then maybe I need to
create such a type for strings that I know have a maximum size. If I did
that then when I create the avro message those strings will be values and
not need the heap.

-- 
Regards,

Andrew Marlow
http://www.andrewpetermarlow.co.uk

Avoiding use of the heap with avro messages in C++

Reply via email to