Hi,
per a previous message, can you give a try to
mpirun -np 2 -hostfile ~/hostfile -mca btl self,tcp --mca pml ob1 ./mpitest
if it still hangs, the issue could be OpenMPI think some subnets are
reachable but they are not.
for diagnostic :
mpirun --mca btl_base_verbose 100 ...
you can explicitly include/exclude subnets with
--mca btl_tcp_if_include xxx
or
--mca btl_tcp_if_exclude yyy
for example,
mpirun --mca btl_btp_if_include 192.168.0.0/24 -np 2 -hostfile
~/hostfile --mca btl self,tcp --mca pml ob1 ./mpitest
should do the trick
Cheers,
Gilles
On 4/4/2016 8:32 AM, dpchoudh . wrote:
Hello all
I don't mean to be competing for the 'silliest question of the year
award', but I can't figure this out on my own:
My 'cluster' has 2 machines, bigMPI and smallMPI. They are connected
via several (types of) networks and the connectivity is OK.
In this setup, the following program hangs after printing
Hello world from processor smallMPI, rank 0 out of 2 processors
Hello world from processor bigMPI, rank 1 out of 2 processors
smallMPI sent haha!
Obviously it is hanging at MPI_Recv(). But why? My command line is as
follows, but this happens if I try openib BTL (instead of TCP) as well.
mpirun -np 2 -hostfile ~/hostfile -mca btl self,tcp ./mpitest
It must be something *really* trivial, but I am drawing a blank right now.
Please help!
#include <mpi.h>
#include <stdio.h>
#include <string.h>
int main(int argc, char** argv)
{
int world_size, world_rank, name_len;
char hostname[MPI_MAX_PROCESSOR_NAME], buf[8];
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &world_size);
MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
MPI_Get_processor_name(hostname, &name_len);
printf("Hello world from processor %s, rank %d out of %d
processors\n", hostname, world_rank, world_size);
if (world_rank == 1)
{
MPI_Recv(buf, 6, MPI_CHAR, 0, 99, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
printf("%s received %s\n", hostname, buf);
}
else
{
strcpy(buf, "haha!");
MPI_Send(buf, 6, MPI_CHAR, 1, 99, MPI_COMM_WORLD);
printf("%s sent %s\n", hostname, buf);
}
MPI_Barrier(MPI_COMM_WORLD);
MPI_Finalize();
return 0;
}
We learn from history that we never learn from history.
_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2016/04/28876.php