Looks like this thread accidentally got dropped; sorry!

More below.


> On May 4, 2019, at 10:40 AM, Eric F. Alemany via users 
> <users@lists.open-mpi.org> wrote:
> 
> Hi Gilles,
> 
> Thank you for your message and your suggestion. As you suggested i tried 
> mpirun -np 84  - -hostfile hostsfile --mca routed direct ./openmpi_hello.c
> 
> The command hangs with no message or error message until i hit "control + z". 
>   Then i have the same error message as before.
> 
> To answer your question here are the answer which made me realize that the 
> Master node’s Open MPI version is 4.0.0
> and the other node(s) - computational nodes the Open MPI version is 4.0.1 - 
> see below output of "ompi_info" Could that be the issue?

It *could* be, yes.  It would be worth making all the versions consistent.

> In my “hostsfile" there are 7 nodes. I followed the FAQ instructions but i am 
> not sure if i created the “hostsfile” correctly. Each node in my cluster has 
> 32 cores, except the Master node.

Your hostfile looks fine.

This error is very, very strange to get on a real system.

Can you try two things:

1. Run with "mpirun --mca routed_base_verbose 100 ..." And send the full output.

2. Run with small sets of nodes and see if the problem is localized to specific 
nodes and/or sets of nodes.

-- 
Jeff Squyres
jsquy...@cisco.com

_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to