Thank you, Gus!  I did go through the mpiexec/mpirun man pages but wasn't quite 
clear that -report-bindings was what I was looking for.   So what I did is 
rerun a program w/ --report-bindings but no bindings were reported.

Scratching my head, I decided to include --bind-to-core as well.  Voila, the 
bindings are reported!  

Awesome, but now here is my concern.  If we have OpenMPI-based applications 
launched as batch jobs via a batch scheduler like SLURM, PBS, LSF, etc. (which 
decides the placement of the app and dispatches it to the compute hosts), then 
will including "--report-bindings --bind-to-core" cause problems?   Certainly I 
can test this, but concerned there may be a case where inclusion of 
--bind-to-core would cause an unexpected problem I did not account for.

--john


-----Original Message-----
From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Gus Correa
Sent: Thursday, March 27, 2014 2:06 PM
To: Open MPI Users
Subject: Re: [OMPI users] Mapping ranks to hosts (from MPI error messages)

Hi John

Take a look at the mpiexec/mpirun options:

-report-bindings (this one should report what you want)

and maybe also also:

-bycore, -bysocket, -bind-to-core, -bind-to-socket, ...

and similar, if you want more control on where your MPI processes run.

"man mpiexec" is your friend!

I hope this helps,
Gus Correa

On 03/27/2014 01:53 PM, Sasso, John (GE Power & Water, Non-GE) wrote:
> When a piece of software built against OpenMPI fails, I will see an 
> error referring to the rank of the MPI task which incurred the failure.
> For example:
>
> MPI_ABORT was invoked on rank 1236 in communicator MPI_COMM_WORLD
>
> with errorcode 1.
>
> Unfortunately, I do not have access to the software code, just the 
> installation directory tree for OpenMPI.  My question is:  Is there a 
> flag that can be passed to mpirun, or an environment variable set, 
> which would reveal the mapping of ranks to the hosts they are on?
>
> I do understand that one could have multiple MPI ranks running on the 
> same host, but finding a way to determine which rank ran on what host 
> would go a long way in help troubleshooting problems which may be 
> central to the host.  Thanks!
>
>                    --john
>
>
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to