Hi everybody,

I have currently a bug when launching a very simple MPI program with mpirun, on 
connected nodes. This happens when I send an INT and then some CHAR strings 
from a master node to a worker node. 
Here is the minimal code to reproduce the bug :


# include <mpi.h>
# include <stdio.h>
# include <string.h>

int main(int argc, char **argv)
{
    int rank, size;
    const char someString[] = "Can haz cheezburgerz?";

    MPI_Init(&argc, &argv);

    MPI_Comm_rank( MPI_COMM_WORLD, & rank );
    MPI_Comm_size( MPI_COMM_WORLD, & size );

    if ( rank == 0 )
    {
        int len = strlen( someString );
        int i;
        for( i = 1; i < size; ++i)
        {
            MPI_Send( &len, 1, MPI_INT, i, 0, MPI_COMM_WORLD );
            MPI_Send( &someString, len+1, MPI_CHAR, i, 0, MPI_COMM_WORLD );
        }
    } else {
        char buffer[ 128 ];
        int receivedLen;
        MPI_Status stat;
        MPI_Recv( &receivedLen, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, &stat );
        printf( "[Worker] Length : %d\n", receivedLen );
        MPI_Recv( buffer, receivedLen+1, MPI_CHAR, 0, 0, MPI_COMM_WORLD, &stat);
        printf( "[Worker] String : %s\n", buffer );
    }

    MPI_Finalize();
}



I know that there is a better way to send a string, by giving a maximum buffer 
size at the second MPI_Recv, but there is no the main topic here.
The launch works locally (i.e when the 2 processes are launched on one 
machine), but doesn't work when the 2 processes are dispatched in 2 machines 
through network (i.e one per host). In this case, the worker correctly reads 
the INT, and then master and worker block on the next call.
I have no issue when sending only char strings or only numbers. This only 
happens when sending char strings then numbers, or in the other order.

I'm using OpenMPI version 1.6, locally compiled. 
$ uname -a
Linux trtp7097 2.6.32-220.13.1.el6.x86_64 #1 SMP Thu Mar 29 11:46:40 EDT 2012 
x86_64 x86_64 x86_64 GNU/Linux
$ cat /etc/redhat-release 
Red Hat Enterprise Linux Workstation release 6.2 (Santiago)

Is it a bad use of the framework or could it be a bug ?

Thank you in advance.
Benjamin

Reply via email to