Removing the preconnect_all option didn't resolve the problem unfortunately.
I tried changing a few of the other options that I pass to mpirun. What does seem to make a difference is the "--map-by node" option. If I remove that option that my test code runs successfully - the output is in the attached test.err file. Ideally I'd like to be able to use "--map-by node" so that the initial processes are distributed across the available resources. Is there some reason why the child processes would be unable to communicate when "--map-by node" is used? -Andrew On Sunday, September 16, 2018 7:03:15 AM PDT Ralph H Castain wrote: > I see you are using “preconnect_all” - that is the source of the trouble. I > don’t believe we have tested that option in years and the code is almost > certainly dead. I’d suggest removing that option and things should work. > > On Sep 15, 2018, at 1:46 PM, Andrew Benson <abenso...@gmail.com> wrote: > > > > I'm running into problems trying to spawn MPI processes across multiple > > nodes on a cluster using recent versions of OpenMPI. Specifically, using > > the attached Fortan code, compiled using OpenMPI 3.1.2 with: > > > > mpif90 test.F90 -o test.exe > > > > and run via a PBS scheduler using the attached test1.pbs, it fails as can > > be seen in the attached testFAIL.err file. > > > > If I do the same but using OpenMPI v1.10.3 then it works successfully, > > giving me the output in the attached testSUCCESS.err file. > > > > From testing a few different versions of OpenMPI it seems that the > > behavior > > changed between v1.10.7 and v2.0.4. > > > > Is there some change in options needed to make this work with newer > > OpenMPIs? > > > > Output from omp_info --all is attached. config.log can be found here: > > > > http://users.obs.carnegiescience.edu/abenson/config.log.bz2 > > > > Thanks for any help you can offer! > > > > -Andrew<ompi_info.log.bz2><test.F90><test1.pbs><testFAIL.err.bz2><testSUCC > > ESS.err.bz2>_______________________________________________ users mailing > > list > > users@lists.open-mpi.org > > https://lists.open-mpi.org/mailman/listinfo/users > > _______________________________________________ > users mailing list > users@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/users -- * Andrew Benson: http://users.obs.carnegiescience.edu/abenson/contact.html * Galacticus: https://bitbucket.org/abensonca/galacticus
parent process: rank, size, processor name = 6 16 node092 parent process: rank, size, processor name = 10 16 node092 parent process: rank, size, processor name = 12 16 node092 parent process: rank, size, processor name = 4 16 node092 parent process: rank, size, processor name = 3 16 node092 parent process: rank, size, processor name = 5 16 node092 parent process: rank, size, processor name = 15 16 node092 parent process: rank, size, processor name = 7 16 node092 parent process: rank, size, processor name = 8 16 node092 parent process: rank, size, processor name = 9 16 node092 parent process: rank, size, processor name = 13 16 node092 parent process: rank, size, processor name = 11 16 node092 parent process: rank, size, processor name = 14 16 node092 parent process: rank, size, processor name = 0 16 node092 parent process: rank, size, processor name = 1 16 node092 parent process: rank, size, processor name = 2 16 node092 parent passed interbarrier: rank = 1 parent passed interbarrier: rank = 3 parent passed interbarrier: rank = 15 parent passed interbarrier: rank = 7 parent passed interbarrier: rank = 8 parent passed interbarrier: rank = 9 parent passed interbarrier: rank = 11 parent passed interbarrier: rank = 2 parent passed interbarrier: rank = 14 parent passed interbarrier: rank = 0 parent passed interbarrier: rank = 4 parent passed interbarrier: rank = 12 parent passed interbarrier: rank = 5 parent passed interbarrier: rank = 6 parent passed interbarrier: rank = 13 parent passed interbarrier: rank = 10 child process: rank, size, processor name = 1 16 node091 child passed intrabarrier: rank = 1 child passed interbarrier: rank = 1 child process: rank, size, processor name = 2 16 node091 child passed intrabarrier: rank = 2 child passed interbarrier: rank = 2 child process: rank, size, processor name = 3 16 node091 child passed intrabarrier: rank = 3 child passed interbarrier: rank = 3 child process: rank, size, processor name = 4 16 node091 child passed intrabarrier: rank = 4 child passed interbarrier: rank = 4 child process: rank, size, processor name = 5 16 node091 child passed intrabarrier: rank = 5 child passed interbarrier: rank = 5 child process: rank, size, processor name = 8 16 node091 child passed intrabarrier: rank = 8 child passed interbarrier: rank = 8 child process: rank, size, processor name = 9 16 node091 child passed intrabarrier: rank = 9 child passed interbarrier: rank = 9 child process: rank, size, processor name = 10 16 node091 child passed intrabarrier: rank = 10 child passed interbarrier: rank = 10 child process: rank, size, processor name = 11 16 node091 child passed intrabarrier: rank = 11 child passed interbarrier: rank = 11 child process: rank, size, processor name = 12 16 node091 child passed intrabarrier: rank = 12 child passed interbarrier: rank = 12 child process: rank, size, processor name = 13 16 node091 child passed intrabarrier: rank = 13 child passed interbarrier: rank = 13 child process: rank, size, processor name = 14 16 node091 child passed intrabarrier: rank = 14 child passed interbarrier: rank = 14 child process: rank, size, processor name = 15 16 node091 child passed intrabarrier: rank = 15 child passed interbarrier: rank = 15 child process: rank, size, processor name = 0 16 node091 child passed intrabarrier: rank = 0 child passed interbarrier: rank = 0 child process: rank, size, processor name = 6 16 node091 child passed intrabarrier: rank = 6 child passed interbarrier: rank = 6 child process: rank, size, processor name = 7 16 node091 child passed intrabarrier: rank = 7 child passed interbarrier: rank = 7
_______________________________________________ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users