On Sat, Jul 21, 2018 at 9:13 PM, Gilles Gouaillardet <gilles.gouaillar...@gmail.com> wrote: > Brian, > > As Ralph already stated, this is likely a hwloc API issue. > From debian9, you can > lstopo --of xml | ssh debian8 lstopo --if xml -i - > > that will likely confirm the API error. > > If you are willing to get a bit more details, you can add some printf > in opal_hwloc_unpack (from opal/mca/hwloc/base/hwloc_base_dt.c) to > figure out where exactly the failure occurs. > > Meanwhile, you can move forward by using the embedded hwloc on both > distros (--with-hwloc=internal or no --with-hwloc option at all). > > > Note we strongly discourage you configure --with-FOO=/usr > (it explicitly add /usr/include and /usr/lib[64] in the search path, > and might hide some other external libraries installed in a non > standard location). In order to force the external hwloc lib installed > in the default location, --with-hwloc=external is what you need (same > thing applies to libevent and pmix)
Thank you for the advice. Removing --with-hwloc from the configure statement corrected the problem. > > > Cheers, > > Gilles > On Sun, Jul 22, 2018 at 7:52 AM r...@open-mpi.org <r...@open-mpi.org> wrote: >> >> More than likely the problem is the difference in hwloc versions - sounds >> like the topology to/from xml is different between the two versions, and the >> older one doesn’t understand the new one. >> >> > On Jul 21, 2018, at 12:04 PM, Brian Smith <bsm...@systemfabricworks.com> >> > wrote: >> > >> > Greetings, >> > >> > I'm having trouble getting openmpi 2.1.2 to work when launching a >> > process from debian 8 on a remote debian 9 host. To keep things simple >> > in this example, I'm just launching date on the remote host. >> > >> > deb8host$ mpirun -H deb9host date >> > [deb8host:01552] [[32763,0],0] ORTE_ERROR_LOG: Error in file >> > base/plm_base_launch_support.c at line 954 >> > >> > It works fine when executed from debian 9: >> > deb9host$ mpirun -H deb8host date >> > Sat Jul 21 13:40:43 CDT 2018 >> > >> > Also works when executed from debian 8 against debian 8: >> > deb8host:~$ mpirun -H deb8host2 date >> > Sat Jul 21 13:55:57 CDT 2018 >> > >> > The failure results from an error code returned by: >> > opal_dss.unpack(buffer, &topo, &idx, OPAL_HWLOC_TOPO) >> > >> > openmpi was built with the same configure flags on both hosts. >> > >> > --prefix=$(PREFIX) \ >> > --with-verbs \ >> > --with-libfabric \ >> > --disable-silent-rules \ >> > --with-hwloc=/usr \ >> > --with-libltdl=/usr \ >> > --with-devel-headers \ >> > --with-slurm \ >> > --with-sge \ >> > --without-tm \ >> > --disable-heterogeneous \ >> > --with-contrib-vt-flags=--disable-iotrace \ >> > --sysconfdir=$(PREFIX)/etc \ >> > --libdir=$(PREFIX)/lib \ >> > --includedir=$(PREFIX)/include >> > >> > >> > deb9host libhwloc and libhwloc-plugins is 1.11.5-1 >> > deb8host libhwloc and libhwloc-plugins is 1.10.0-3 >> > >> > I've been trying to debug this for the past few days and would >> > appreciate any help on determining why this failure is occurring >> > and/or resolving the problem. >> > >> > -- >> > Brian T. Smith >> > System Fabric Works >> > Senior Technical Staff >> > bsm...@systemfabricworks.com >> > GPG Key: B3C2C7B73BA3CD7F >> > _______________________________________________ >> > users mailing list >> > users@lists.open-mpi.org >> > https://lists.open-mpi.org/mailman/listinfo/users >> >> _______________________________________________ >> users mailing list >> users@lists.open-mpi.org >> https://lists.open-mpi.org/mailman/listinfo/users > _______________________________________________ > users mailing list > users@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/users -- Brian T. Smith System Fabric Works Senior Technical Staff bsm...@systemfabricworks.com GPG Key: B3C2C7B73BA3CD7F _______________________________________________ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users