Hey Vitaly, I'm not sure what the best solution here is. Ideally, we want both pycapnp and nupic to always link to the exact same compiled version of C++ Cap'n Proto.
Perhaps it should be pycapnp's responsibility to always bundle a compiled distribution of C++ Cap'n Proto, complete with headers. Then in nupic's build process, it could call some pycapnp function that supplies said directories (similar to how numpy.get_include() works). Does that sound reasonable to you? On Thu, Jul 28, 2016 at 6:17 PM, vitaly numenta < [email protected]> wrote: > We see errors converting pycapnp builders to C++ capnp builders on Ubuntu > 16.04 > when using our Python extensions compiled under the "manylinux" environment > (Centos-6.8 with gcc 4.8.2) > > We pass a pycapnp builder to C++ using this code: > https://github.com/numenta/nupic.core/blob/064f8b1ef003d5ee07405cd5ac4158 > 3f83ab1d35/src/nupic/py_support/PyCapnp.hpp#L71 > > When we cast the schema parser to `pycapnp_SchemaParser*` and deref the > `thisptr` attribute, the values appear bogus, suggesting an incorrect cast. > > Pycapnp was installed on Ubuntu 16.04 and builds the extensions and > capnproto > using gcc 5.4.0. > > Is it possible that the SchemaParser or SchemaLoader struct from the > pycapnp > extension built with gcc 5.4.0 has different alignment/layout than > expected by > the cast in the NuPIC C extension compiled with gcc 4.8.2? > > > More details... > > First, an overview of capnp's integration into nupic and nupic.bindings: > The > `nupic` pure python package gets capnp via the `pycapnp==0.5.8` package, > which > contains its own version of compiled `capnproto` sources. > `nupic.bindings`, a python > extension built in `nupic.core`, includes its own version of capnp 0.5.3 > sources > compiled into the extension's shared libraries, such as `_algorithms.so`, > `_math.so`, etc. Nupic.bindings contains the C++ implementation of classes > and supporting > logic used by nupic. > > So, when nupic is used, there are two versions of compiled capnproto in > play: one > from pycapnp imported by nupic, and another built into the nupic.bindings > extension. > On the Ubuntu 16.04 system, pycapnp's capnpproto C++ sources were compiled > via > gcc/g++ 5.4.0 during installation of pycapnp on that system. The capnproto > C++ > sources in nupic.bindings were compiled on CentOS-6.8 using gcc/g++ 4.8.2 > during > the build of the "manylinux" nupic.bindings wheel. Note that those > toolchains are a > MAJOR VERSION apart and the two extensions compile the capnproto C++ > sources > independently using their own sets of compiler/linker flags and options > (not to > mention that the two versions of capnp sources, although similar, might > not be > identical). > > When nupic wants to serialize a nupic.bindings-based object, nupic passes > the > python Builder object instantiated by pycapnp to the nupic.bindings python > extension, whose C++ code extracts the C++ Builder from the python > Builder. For > example, in the case of the Random class, nupic.bindings' _math.so > extracts the > C++ RandomProto::Builder instance from the python Builder instance at > https://github.com/numenta/nupic.core/blob/0.4.4/src/ > nupic/bindings/math.i#L374-L375, > then passes the extracted builder instance to the C++ Random object's > `write` method for serialization. > > So, the nupic.bindings extension's shared libs pass C++ capnp objects > instantiated by pycapnp's build of capnp to nupic.bindings-based methods > that > act on those capnp objects using methods in nupic.bindings' own build of > capnproto. > To reiterate, objects instantiated by pycapnp's build of capnproto are > being > operated on by methods in nupic.bindings's own build of capnproto code. > > This integration happens to work when both pycapnp and nupic.bindings are > both > compiled/linked on the same platform. Also, it seems to work when the two > are > compiled/linked with nearby versions of toolchains, such as pycapnp being > built > on Ubuntu 14.04 with gcc/g++ 4.8.4 and nupic.bindings being built on > CentOS-6.8 > with gcc/g++ 4.8.2. > > However, the integration misbehaves when installed on Ubuntu Server 16.04. > In > this case, pycapnp==0.5.8 is built (as the result of installation from > PyPi) on > Ubuntu 16.04 by gcc/g++ 5.4.0, but the manylinux nupic.bindings wheel was > built > on CentOS-6.8 using gcc/g++ 4.8.2. The detailed root-cause analysis is in > https://github.com/numenta/nupic.core/issues/1013#issuecomment-235736477 > (look > for "ROOT-CAUSE ANALYSIS" in that github issue). The short version of it > is: > > 1. nupic.bindings extracts the C++ capnp Builder object from python > Builder that > was instantiated by the pycapnp python extension. nupic.bindings uses this > function that's linked into _math.so to extract the C++ Builder object: > > ``` > template<class T> typename T::Builder getBuilder(PyObject* pyBuilder) > { > PyObject* capnpModule = PyImport_AddModule("capnp.lib.capnp"); > PyObject* > pySchemaParser = PyObject_GetAttrString(capnpModule, > "_global_schema_parser"); > > pycapnp_SchemaParser* schemaParser = (pycapnp_SchemaParser*) > pySchemaParser; > schemaParser->thisptr->loadCompiledTypeAndDependencies<T>(); > > pycapnp_DynamicStructBuilder* dynamicStruct = > (pycapnp_DynamicStructBuilder*)pyBuilder; > capnp::DynamicStruct::Builder& builder = dynamicStruct->thisptr; > typename T::Builder proto = builder.as<T>(); > return proto; > } > ``` > > 2. The statement `schemaParser->thisptr->loadCompiledTypeAndDependencie > s<T>()` > invokes `capnp::SchemaParser::loadCompiledTypeAndDependencies()` method on > `thisptr`, which is a pointer to the {{capnp::SchemaParser}} instance > instantiated by pycapnp's capnp code. > > 3. However, because `nupic::getBuilder<RandomProto>` is compiled into > nupic.bindings' python extension that includes its own version of capnp > (in > _math.so, in this case), the call to > `capnp::SchemaParser::loadCompiledTypeAndDependencies<T>()` resolved to > capnp in > _math.so, instead of the capnp code in pycapnp build that instantiated this > `capnp::SchemaParser` object. > > 4. This is where things get hairy: when we use gdb to examine the contents > of > the `capnp::SchemaLoader` referenced by the extracted `capnp::SchemaParser` > (that was instantiated by pycapnp's capnp code) at the point where > `capnp::SchemaLoader::loadNative` is called inside the nupic.bindings's > own > build of capnp, we observe that the instance member contents don't make any > sense. There is apparently some mismatch taking place between the > capnp::SchemaLoader object instantiated by pycapnp's capnp code (built > with g++ > 5.4.0) and the corresponding capnp::SchemaLoader class in the manylinux > nupic.bindings wheel (built with g++ 4.8.2): > > ``` > (gdb) p this > $17 = (capnp::SchemaLoader * const) 0x103cba0 (gdb) p *this $18 > = {impl = {mutex = {futex = 4031237736, static EXCLUSIVE_HELD = 2147483648 > , > static EXCLUSIVE_REQUESTED = 1073741824, static SHARED_COUNT_MASK = > 1073741823}, > value = { disposer = 0x7ffff07208e8, ptr = 0xfffffffffffffffd}}} > > or in hex like this: > > (gdb) p/x *this > $26 = {impl = {mutex = {futex = 0xf047ce68, > static EXCLUSIVE_HELD = 0x80000000, static EXCLUSIVE_REQUESTED = > 0x40000000, > static SHARED_COUNT_MASK = 0x3fffffff}, value = { disposer = > 0x7ffff07208e8, ptr > = 0xfffffffffffffffd}}} > ``` > > In particular, we note that the instance member `mutex.futex` has an > invalid > value 0xf047ce68 (it should have been 0 at this point in the > single-threaded > execution); impl.value.ptr also has an invalid value of 0xfffffffffffffffd > - it > should have been either null or a valid pointer. Subsequently, when > kj::Mutex::lock attempts to lock the futex, the system call never returns, > because of the bogus value in mutex.futex. > > -- > You received this message because you are subscribed to the Google Groups > "Cap'n Proto" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > Visit this group at https://groups.google.com/group/capnproto. > -- You received this message because you are subscribed to the Google Groups "Cap'n Proto" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. Visit this group at https://groups.google.com/group/capnproto.
