Hi,

per a previous message, can you give a try to
mpirun -np 2 -hostfile ~/hostfile -mca btl self,tcp --mca pml ob1 ./mpitest

if it still hangs, the issue could be OpenMPI think some subnets are reachable but they are not.

for diagnostic :
mpirun --mca btl_base_verbose 100 ...

you can explicitly include/exclude subnets with
--mca btl_tcp_if_include xxx
or
--mca btl_tcp_if_exclude yyy

for example,
mpirun --mca btl_btp_if_include 192.168.0.0/24 -np 2 -hostfile ~/hostfile --mca btl self,tcp --mca pml ob1 ./mpitest
should do the trick

Cheers,

Gilles



On 4/4/2016 8:32 AM, dpchoudh . wrote:
Hello all

I don't mean to be competing for the 'silliest question of the year award', but I can't figure this out on my own:

My 'cluster' has 2 machines, bigMPI and smallMPI. They are connected via several (types of) networks and the connectivity is OK.

In this setup, the following program hangs after printing

Hello world from processor smallMPI, rank 0 out of 2 processors
Hello world from processor bigMPI, rank 1 out of 2 processors
smallMPI sent haha!


Obviously it is hanging at MPI_Recv(). But why? My command line is as follows, but this happens if I try openib BTL (instead of TCP) as well.

mpirun -np 2 -hostfile ~/hostfile -mca btl self,tcp ./mpitest

It must be something *really* trivial, but I am drawing a blank right now.

Please help!

#include <mpi.h>
#include <stdio.h>
#include <string.h>

int main(int argc, char** argv)
{
    int world_size, world_rank, name_len;
    char hostname[MPI_MAX_PROCESSOR_NAME], buf[8];

    MPI_Init(&argc, &argv);
    MPI_Comm_size(MPI_COMM_WORLD, &world_size);
    MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
    MPI_Get_processor_name(hostname, &name_len);
printf("Hello world from processor %s, rank %d out of %d processors\n", hostname, world_rank, world_size);
    if (world_rank == 1)
    {
    MPI_Recv(buf, 6, MPI_CHAR, 0, 99, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
    printf("%s received %s\n", hostname, buf);
    }
    else
    {
    strcpy(buf, "haha!");
    MPI_Send(buf, 6, MPI_CHAR, 1, 99, MPI_COMM_WORLD);
    printf("%s sent %s\n", hostname, buf);
    }
    MPI_Barrier(MPI_COMM_WORLD);
    MPI_Finalize();
    return 0;
}



We learn from history that we never learn from history.


_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2016/04/28876.php

Reply via email to