Hello developers, I am working on a C++17 project that uses avro and hence, avrogencpp. While doing some soak testing I noticed that a long running process that publishes lots of avro messages and that hangs on to them (trust me, it needs to for the case I am working on) will consume ever more heap memory. Eventually the process gets killed by the OOM killer. This is a shame because I want the memory used for the messages to get drawn from a memory pool that I create at the start. I do allocate from the pool for the avro message itself, but some parts of what is inside the avro message are still using the heap.
I have managed to make some progress with this. I've created a ticket, https://issues.apache.org/jira/browse/AVRO-3705, so that the avrogencpp from the RHEL repo can emit code that uses the C++17 std::any rather than the boost::any. I have a PR for this which adds the --cpp17 command line option to avrogencpp. This means that for small pieces of data in an avro message, e.g. integers or booleans, the avro message will not allocate from the heap. This is because the std::any has a SBO (small buffer optimisation). Also it will not allocate from the heap for string data if the string fits into the SSO (small string optimisation) buffer that we get with the std::string from gcc. However, strings longer than the SSO buffer size do get allocated from the heap. I wonder if anyone can suggest what to do about this? I am relatively new to kafka and avro so I am not sure if avro supports the idea of being able to create low level custom types. If it does then maybe I need to create such a type for strings that I know have a maximum size. If I did that then when I create the avro message those strings will be values and not need the heap. -- Regards, Andrew Marlow http://www.andrewpetermarlow.co.uk