Hi Kenton, I have good news - my basic prototype of serializing across
C++/Python boundaries (in both directions) via capnp byte buffer passing is
working. I am shifting to optimizing memory utilization. In NuPIC, our
core machine learning algorithm objects may get huge - upwards of GBs, and
we run many on same machine, serializing periodically. So, memory
utilization is critical for minimizing compute resource cost.
Presently, I am focusing on the C++ extension => python deserialization
control flow. In this scenario, the C++ python extension layer has a
message reader that contains a python object encoding. So, we need to
extract the byte buffer representing the python-native object in the C++
code in order to pass it to Python layer. This is what the relevant code in
C++ looks like:
PyObject* Network::_readPyRegion(const std::string& moduleName,
const std::string& className,
const RegionProto::Reader& proto)
{
// Extract data bytes from reader to pass to python layer
capnp::MallocMessageBuilder builder;
builder.setRoot(pyRegionImplProto); // copy
auto array = capnp::messageToFlatArray(builder); // copy
// Copy from array to PyObject so that we can pass it to the Python layer
py::String pyRegionImplBytes((const char *)array.begin(),
sizeof(capnp::word)*array.size()); // copy
}
As you can see, this involves a lot of copies of potentially huge amounts
of data. The python layer will then reconstruct a reader from those bytes
using pycapnp (yet another copy).
Ideally, I would like to extract the data segment(s) directly from
RegionProto::Reader, but that doesn't appear to be supported. I think that
we need to find/create some way to handle this efficiently in order to
support serialization/deserialization across C++/Python boundaries.
Thank you,
Vitaly
--
You received this message because you are subscribed to the Google Groups
"Cap'n Proto" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
Visit this group at https://groups.google.com/group/capnproto.