You cited Open MPI v2.1.1. That's a pretty ancient version of Open MPI. Any chance you can upgrade to Open MPI 4.0.x?
> On Jun 5, 2020, at 7:24 PM, Stephen Siegel <sie...@udel.edu> wrote: > > > >> On Jun 5, 2020, at 6:55 PM, Jeff Squyres (jsquyres) <jsquy...@cisco.com> >> wrote: >> >> On Jun 5, 2020, at 6:35 PM, Stephen Siegel via users >> <users@lists.open-mpi.org> wrote: >>> >>> [ilyich:12946] 3 more processes have sent help message >>> help-mpi-btl-base.txt / btl:no-nics >>> [ilyich:12946] Set MCA parameter "orte_base_help_aggregate" to 0 to see all >>> help / error messages >> >> It looks like your output somehow doesn't include the actual error message. > > You’re right, on this first machine I did not include all of the output. It > is: > > siegel@ilyich:~/372/code/mpi/io$ mpiexec -n 4 ./a.out > -------------------------------------------------------------------------- > [[171,1],0]: A high-performance Open MPI point-to-point messaging module > was unable to find any relevant network interfaces: > > Module: OpenFabrics (openib) > Host: ilyich > > Another transport will be used instead, although this may result in > lower performance. > > NOTE: You can disable this warning by setting the MCA parameter > btl_base_warn_component_unused to 0. > ————————————————————————————————————— > > So, I’ll ask my people to look into how they configured this. > > However, on the second machine which uses SLURM it consistently hangs on this > example, although many other examples using MPI I/O work fine. > > -Steve > > > > >> That error message was sent to stderr, so you may not have captured it if >> you only did "mpirun ... > foo.txt". The actual error message template is >> this: >> >> ----- >> %s: A high-performance Open MPI point-to-point messaging module >> was unable to find any relevant network interfaces: >> >> Module: %s >> Host: %s >> >> Another transport will be used instead, although this may result in >> lower performance. >> >> NOTE: You can disable this warning by setting the MCA parameter >> btl_base_warn_component_unused to 0. >> ----- >> >> This is not actually an error -- just a warning. It typically means that >> your Open MPI has support for HPC-class networking, Open MPI saw some >> evidence of HPC-class networking on the nodes on which your job ran, but >> ultimately didn't use any of those HPC-class networking interfaces for some >> reason and therefore fell back to TCP. >> >> I.e., your program ran correctly, but it may have run slower than it could >> have if it were able to use HPC-class networks. >> >> -- >> Jeff Squyres >> jsquy...@cisco.com -- Jeff Squyres jsquy...@cisco.com