Hello, Tl;dr: I'm doing some profiling, and the cpp thin client is spending a lot of time in memset. It's spending almost as much time as in the socket recv call.
Longer version: I was profiling a test harness that does a Get from our ingite grid. The test harness is written in c++ using the thin client. I profiled the code using gperftools, and I found that memset (__memset_sse2 in my case) was taking a large portion of the execution time. I'm Get-ting a BinaryType which contains a std::string, an int32_t, and an array of int8_t. For my test case, the array of int8_t values can vary, but I got the best throughput on my machine using about 100MB for the size of that array. I profiled the test code doing a single Get, and doing 8 Gets. In the 8 Get case, the number of memset calls increased, but the percentage of overall time spent in memset was reduced. However it was not reduced as much as I'd hoped. I was hoping that the first Get call would have a large memset, and the rest of the Get calls would skip it, but that's maybe not the case. I'm seeing almost as much time spent in memset as is being spent in recv (__libc_recv in this case). That seems like a lot of time spent initializing. I suspect that it's std::vector initialization caused by resize. I believe memset is being invoked by a std::vector::resize operation inside of ignite::binary::BinaryType::Read. I believe the source file is modules/platforms/cpp/binary/src/impl/binary/binary_reader_impl.cpp. In the code I'm looking at it's line 905. There are only two calls to resize in this sourcefile, and it's the one in ReadTopObject0 which I think is the culprit. I didn't compile with debug symbols to confirm the particular resize call, but my profiler's callstack shows that resize is to blame for all the memset calls. Is there any way we can avoid std::vector::resize? I suspect that ultimately the problem is that the buffer somewhere gets passed to a socket recv call, and recv call takes a pointer and length. In that case, there's no way that I know of to use a std::vector for the buffer and avoid the unnecessary initialization/memset in the resize call. Could another container be used instead of a vector? Could the vector be reused, so on subsequent calls we don't need to resize it again? Could something like uvector (which skips initialization) be used instead? Thank, Brett