Siegmar, I was able to reproduce the issue on my vm (No need for a real heterogeneous cluster here)
I will keep digging tomorrow. Note that if you specify an incorrect slot list, MPI_Comm_spawn fails with a very unfriendly error message. Right now, the 4th spawn'ed task crashes, so this is a different issue Cheers, Gilles r...@open-mpi.org wrote: >I think there is some relevant discussion here: >https://github.com/open-mpi/ompi/issues/1569 > > >It looks like Gilles had (at least at one point) a fix for master when >enable-heterogeneous, but I don’t know if that was committed. > > >On Jan 9, 2017, at 8:23 AM, Howard Pritchard <hpprit...@gmail.com> wrote: > > >HI Siegmar, > > >You have some config parameters I wasn't trying that may have some impact. > >I'll give a try with these parameters. > > >This should be enough info for now, > > >Thanks, > > >Howard > > > >2017-01-09 0:59 GMT-07:00 Siegmar Gross <siegmar.gr...@informatik.hs-fulda.de>: > >Hi Howard, > >I use the following commands to build and install the package. >${SYSTEM_ENV} is "Linux" and ${MACHINE_ENV} is "x86_64" for my >Linux machine. > >mkdir openmpi-2.0.2rc3-${SYSTEM_ENV}.${MACHINE_ENV}.64_cc >cd openmpi-2.0.2rc3-${SYSTEM_ENV}.${MACHINE_ENV}.64_cc > >../openmpi-2.0.2rc3/configure \ > --prefix=/usr/local/openmpi-2.0.2_64_cc \ > --libdir=/usr/local/openmpi-2.0.2_64_cc/lib64 \ > --with-jdk-bindir=/usr/local/jdk1.8.0_66/bin \ > --with-jdk-headers=/usr/local/jdk1.8.0_66/include \ > JAVA_HOME=/usr/local/jdk1.8.0_66 \ > LDFLAGS="-m64 -mt -Wl,-z -Wl,noexecstack" CC="cc" CXX="CC" FC="f95" \ > CFLAGS="-m64 -mt" CXXFLAGS="-m64" FCFLAGS="-m64" \ > CPP="cpp" CXXCPP="cpp" \ > --enable-mpi-cxx \ > --enable-mpi-cxx-bindings \ > --enable-cxx-exceptions \ > --enable-mpi-java \ > --enable-heterogeneous \ > --enable-mpi-thread-multiple \ > --with-hwloc=internal \ > --without-verbs \ > --with-wrapper-cflags="-m64 -mt" \ > --with-wrapper-cxxflags="-m64" \ > --with-wrapper-fcflags="-m64" \ > --with-wrapper-ldflags="-mt" \ > --enable-debug \ > |& tee log.configure.$SYSTEM_ENV.$MACHINE_ENV.64_cc > >make |& tee log.make.$SYSTEM_ENV.$MACHINE_ENV.64_cc >rm -r /usr/local/openmpi-2.0.2_64_cc.old >mv /usr/local/openmpi-2.0.2_64_cc /usr/local/openmpi-2.0.2_64_cc.old >make install |& tee log.make-install.$SYSTEM_ENV.$MACHINE_ENV.64_cc >make check |& tee log.make-check.$SYSTEM_ENV.$MACHINE_ENV.64_cc > > >I get a different error if I run the program with gdb. > >loki spawn 118 gdb /usr/local/openmpi-2.0.2_64_cc/bin/mpiexec >GNU gdb (GDB; SUSE Linux Enterprise 12) 7.11.1 >Copyright (C) 2016 Free Software Foundation, Inc. >License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> >This is free software: you are free to change and redistribute it. >There is NO WARRANTY, to the extent permitted by law. Type "show copying" >and "show warranty" for details. >This GDB was configured as "x86_64-suse-linux". >Type "show configuration" for configuration details. >For bug reporting instructions, please see: ><http://bugs.opensuse.org/>. >Find the GDB manual and other documentation resources online at: ><http://www.gnu.org/software/gdb/documentation/>. >For help, type "help". >Type "apropos word" to search for commands related to "word"... >Reading symbols from /usr/local/openmpi-2.0.2_64_cc/bin/mpiexec...done. >(gdb) r -np 1 --host loki --slot-list 0:0-5,1:0-5 spawn_master >Starting program: /usr/local/openmpi-2.0.2_64_cc/bin/mpiexec -np 1 --host loki >--slot-list 0:0-5,1:0-5 spawn_master >Missing separate debuginfos, use: zypper install >glibc-debuginfo-2.24-2.3.x86_64 >[Thread debugging using libthread_db enabled] >Using host libthread_db library "/lib64/libthread_db.so.1". >[New Thread 0x7ffff3b97700 (LWP 13582)] >[New Thread 0x7ffff18a4700 (LWP 13583)] >[New Thread 0x7ffff10a3700 (LWP 13584)] >[New Thread 0x7fffebbba700 (LWP 13585)] >Detaching after fork from child process 13586. > >Parent process 0 running on loki > I create 4 slave processes > >Detaching after fork from child process 13589. >Detaching after fork from child process 13590. >Detaching after fork from child process 13591. >[loki:13586] OPAL ERROR: Timeout in file >../../../../openmpi-2.0.2rc3/opal/mca/pmix/base/pmix_base_fns.c at line 193 >[loki:13586] *** An error occurred in MPI_Comm_spawn >[loki:13586] *** reported by process [2873294849,0] >[loki:13586] *** on communicator MPI_COMM_WORLD >[loki:13586] *** MPI_ERR_UNKNOWN: unknown error >[loki:13586] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now >abort, >[loki:13586] *** and potentially your MPI job) >[Thread 0x7fffebbba700 (LWP 13585) exited] >[Thread 0x7ffff10a3700 (LWP 13584) exited] >[Thread 0x7ffff18a4700 (LWP 13583) exited] >[Thread 0x7ffff3b97700 (LWP 13582) exited] >[Inferior 1 (process 13567) exited with code 016] >Missing separate debuginfos, use: zypper install >libpciaccess0-debuginfo-0.13.2-5.1.x86_64 libudev1-debuginfo-210-116.3.3.x86_64 >(gdb) bt >No stack. >(gdb) > >Do you need anything else? > > >Kind regards > >Siegmar > >Am 08.01.2017 um 17:02 schrieb Howard Pritchard: > >HI Siegmar, > >Could you post the configury options you use when building the 2.0.2rc3? >Maybe that will help in trying to reproduce the segfault you are observing. > >Howard > > >2017-01-07 2:30 GMT-07:00 Siegmar Gross <siegmar.gr...@informatik.hs-fulda.de ><mailto:siegmar.gr...@informatik.hs-fulda.de>>: > > Hi, > > I have installed openmpi-2.0.2rc3 on my "SUSE Linux Enterprise > Server 12 (x86_64)" with Sun C 5.14 and gcc-6.3.0. Unfortunately, > I still get the same error that I reported for rc2. > > I would be grateful, if somebody can fix the problem before > releasing the final version. Thank you very much for any help > in advance. > > > Kind regards > > Siegmar > _______________________________________________ > users mailing list > users@lists.open-mpi.org <mailto:users@lists.open-mpi.org> > https://rfd.newmexicoconsortium.org/mailman/listinfo/users ><https://rfd.newmexicoconsortium.org/mailman/listinfo/users> > > > > >_______________________________________________ >users mailing list >users@lists.open-mpi.org >https://rfd.newmexicoconsortium.org/mailman/listinfo/users > >_______________________________________________ >users mailing list >users@lists.open-mpi.org >https://rfd.newmexicoconsortium.org/mailman/listinfo/users > > >_______________________________________________ >users mailing list >users@lists.open-mpi.org >https://rfd.newmexicoconsortium.org/mailman/listinfo/users > >
_______________________________________________ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users