You cited Open MPI v2.1.1.  That's a pretty ancient version of Open MPI.

Any chance you can upgrade to Open MPI 4.0.x?



> On Jun 5, 2020, at 7:24 PM, Stephen Siegel <sie...@udel.edu> wrote:
> 
> 
> 
>> On Jun 5, 2020, at 6:55 PM, Jeff Squyres (jsquyres) <jsquy...@cisco.com> 
>> wrote:
>> 
>> On Jun 5, 2020, at 6:35 PM, Stephen Siegel via users 
>> <users@lists.open-mpi.org> wrote:
>>> 
>>> [ilyich:12946] 3 more processes have sent help message 
>>> help-mpi-btl-base.txt / btl:no-nics
>>> [ilyich:12946] Set MCA parameter "orte_base_help_aggregate" to 0 to see all 
>>> help / error messages
>> 
>> It looks like your output somehow doesn't include the actual error message.
> 
> You’re right, on this first machine I did not include all of the output.  It 
> is:
> 
> siegel@ilyich:~/372/code/mpi/io$ mpiexec -n 4 ./a.out
> --------------------------------------------------------------------------
> [[171,1],0]: A high-performance Open MPI point-to-point messaging module
> was unable to find any relevant network interfaces:
> 
> Module: OpenFabrics (openib)
>  Host: ilyich
> 
> Another transport will be used instead, although this may result in
> lower performance.
> 
> NOTE: You can disable this warning by setting the MCA parameter
> btl_base_warn_component_unused to 0.
> —————————————————————————————————————
> 
> So, I’ll ask my people to look into how they configured this.
> 
> However, on the second machine which uses SLURM it consistently hangs on this 
> example, although many other examples using MPI I/O work fine.
> 
> -Steve
> 
> 
> 
> 
>> That error message was sent to stderr, so you may not have captured it if 
>> you only did "mpirun ... > foo.txt".  The actual error message template is 
>> this:
>> 
>> -----
>> %s: A high-performance Open MPI point-to-point messaging module
>> was unable to find any relevant network interfaces:
>> 
>> Module: %s
>> Host: %s
>> 
>> Another transport will be used instead, although this may result in
>> lower performance.
>> 
>> NOTE: You can disable this warning by setting the MCA parameter
>> btl_base_warn_component_unused to 0.
>> -----
>> 
>> This is not actually an error -- just a warning.  It typically means that 
>> your Open MPI has support for HPC-class networking, Open MPI saw some 
>> evidence of HPC-class networking on the nodes on which your job ran, but 
>> ultimately didn't use any of those HPC-class networking interfaces for some 
>> reason and therefore fell back to TCP.
>> 
>> I.e., your program ran correctly, but it may have run slower than it could 
>> have if it were able to use HPC-class networks.
>> 
>> -- 
>> Jeff Squyres
>> jsquy...@cisco.com


-- 
Jeff Squyres
jsquy...@cisco.com

Reply via email to