Removing the preconnect_all option didn't resolve the problem unfortunately.  

I tried changing a few of the other options that I pass to mpirun. What does 
seem to make a difference is the "--map-by node" option. If I remove that 
option that my test code runs successfully - the output is in the attached 
test.err file.

Ideally I'd like to be able to use "--map-by node" so that the initial 
processes are distributed across the available resources. Is there some reason 
why the child processes would be unable to communicate when "--map-by node" is 
used?

-Andrew


On Sunday, September 16, 2018 7:03:15 AM PDT Ralph H Castain wrote:
> I see you are using “preconnect_all” - that is the source of the trouble. I
> don’t believe we have tested that option in years and the code is almost
> certainly dead. I’d suggest removing that option and things should work.
> > On Sep 15, 2018, at 1:46 PM, Andrew Benson <abenso...@gmail.com> wrote:
> > 
> > I'm running into problems trying to spawn MPI processes across multiple
> > nodes on a cluster using recent versions of OpenMPI. Specifically, using
> > the attached Fortan code, compiled using OpenMPI 3.1.2 with:
> > 
> > mpif90 test.F90 -o test.exe
> > 
> > and run via a PBS scheduler using the attached test1.pbs, it fails as can
> > be seen in the attached testFAIL.err file.
> > 
> > If I do the same but using OpenMPI v1.10.3 then it works successfully,
> > giving me the output in the attached testSUCCESS.err file.
> > 
> > From testing a few different versions of OpenMPI it seems that the
> > behavior
> > changed between v1.10.7 and v2.0.4.
> > 
> > Is there some change in options needed to make this work with newer
> > OpenMPIs?
> > 
> > Output from omp_info --all is attached. config.log can be found here:
> > 
> > http://users.obs.carnegiescience.edu/abenson/config.log.bz2
> > 
> > Thanks for any help you can offer!
> > 
> > -Andrew<ompi_info.log.bz2><test.F90><test1.pbs><testFAIL.err.bz2><testSUCC
> > ESS.err.bz2>_______________________________________________ users mailing
> > list
> > users@lists.open-mpi.org
> > https://lists.open-mpi.org/mailman/listinfo/users
> 
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users


-- 

* Andrew Benson: http://users.obs.carnegiescience.edu/abenson/contact.html

* Galacticus: https://bitbucket.org/abensonca/galacticus
 parent process: rank, size, processor name =            6          16 node092
 parent process: rank, size, processor name =           10          16 node092
 parent process: rank, size, processor name =           12          16 node092
 parent process: rank, size, processor name =            4          16 node092
 parent process: rank, size, processor name =            3          16 node092
 parent process: rank, size, processor name =            5          16 node092
 parent process: rank, size, processor name =           15          16 node092
 parent process: rank, size, processor name =            7          16 node092
 parent process: rank, size, processor name =            8          16 node092
 parent process: rank, size, processor name =            9          16 node092
 parent process: rank, size, processor name =           13          16 node092
 parent process: rank, size, processor name =           11          16 node092
 parent process: rank, size, processor name =           14          16 node092
 parent process: rank, size, processor name =            0          16 node092
 parent process: rank, size, processor name =            1          16 node092
 parent process: rank, size, processor name =            2          16 node092
 parent passed interbarrier: rank =            1
 parent passed interbarrier: rank =            3
 parent passed interbarrier: rank =           15
 parent passed interbarrier: rank =            7
 parent passed interbarrier: rank =            8
 parent passed interbarrier: rank =            9
 parent passed interbarrier: rank =           11
 parent passed interbarrier: rank =            2
 parent passed interbarrier: rank =           14
 parent passed interbarrier: rank =            0
 parent passed interbarrier: rank =            4
 parent passed interbarrier: rank =           12
 parent passed interbarrier: rank =            5
 parent passed interbarrier: rank =            6
 parent passed interbarrier: rank =           13
 parent passed interbarrier: rank =           10
  child process: rank, size, processor name =            1          16 node091
  child passed intrabarrier: rank =            1
  child passed interbarrier: rank =            1
  child process: rank, size, processor name =            2          16 node091
  child passed intrabarrier: rank =            2
  child passed interbarrier: rank =            2
  child process: rank, size, processor name =            3          16 node091
  child passed intrabarrier: rank =            3
  child passed interbarrier: rank =            3
  child process: rank, size, processor name =            4          16 node091
  child passed intrabarrier: rank =            4
  child passed interbarrier: rank =            4
  child process: rank, size, processor name =            5          16 node091
  child passed intrabarrier: rank =            5
  child passed interbarrier: rank =            5
  child process: rank, size, processor name =            8          16 node091
  child passed intrabarrier: rank =            8
  child passed interbarrier: rank =            8
  child process: rank, size, processor name =            9          16 node091
  child passed intrabarrier: rank =            9
  child passed interbarrier: rank =            9
  child process: rank, size, processor name =           10          16 node091
  child passed intrabarrier: rank =           10
  child passed interbarrier: rank =           10
  child process: rank, size, processor name =           11          16 node091
  child passed intrabarrier: rank =           11
  child passed interbarrier: rank =           11
  child process: rank, size, processor name =           12          16 node091
  child passed intrabarrier: rank =           12
  child passed interbarrier: rank =           12
  child process: rank, size, processor name =           13          16 node091
  child passed intrabarrier: rank =           13
  child passed interbarrier: rank =           13
  child process: rank, size, processor name =           14          16 node091
  child passed intrabarrier: rank =           14
  child passed interbarrier: rank =           14
  child process: rank, size, processor name =           15          16 node091
  child passed intrabarrier: rank =           15
  child passed interbarrier: rank =           15
  child process: rank, size, processor name =            0          16 node091
  child passed intrabarrier: rank =            0
  child passed interbarrier: rank =            0
  child process: rank, size, processor name =            6          16 node091
  child passed intrabarrier: rank =            6
  child passed interbarrier: rank =            6
  child process: rank, size, processor name =            7          16 node091
  child passed intrabarrier: rank =            7
  child passed interbarrier: rank =            7
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to