I have run into a freeze / potential bug when using MPI_Comm_accept in a
simple client / server implementation. I have attached two simplest
programs I could produce:
1. mpi-receiver.c opens a port using MPI_Open_port, saves the port
name to a file
2. mpi-receiver enters infinite loop and waits for connections using
MPI_Comm_accept
3. mpi-sender.c connects to that port using MPI_Comm_connect, sends
one MPI_UNSIGNED_LONG, calls barrier and disconnects using
MPI_Comm_disconnect
4. mpi-receiver reads the MPI_UNSIGNED_LONG, prints it, calls barrier
and disconnects using MPI_Comm_disconnect and goes to point 2 - infinite
loop
All works fine, but only exactly 5 times. After that the receiver hangs
in MPI_Recv, after exit from MPI_Comm_accept. That is 100% repeatable. I
have tried with Intel MPI - no such problem.
I execute the programs using OpenMPI 1.10 as follows
mpirun -np 1 --mca mpi_leave_pinned 0 ./mpi-receiver
Do you have any clues what could be the reason? Am I doing sth wrong, or
is it some problem with internal state of OpenMPI?
Thanks a lot!
Marcin
#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(int argc, char **argv)
{
MPI_Info info;
char port_name[MPI_MAX_PORT_NAME];
MPI_Comm intercomm;
MPI_Init(&argc, &argv);
MPI_Info_create(&info);
MPI_Open_port(info, port_name);
printf("port name: %s\n", port_name);
/* write port name to file */
{
FILE *fd;
fd = fopen("port.txt", "w+");
fprintf(fd, "%s", port_name);
fclose(fd);
}
/* accept connections */
while(1){
unsigned long data;
/* accept connection */
MPI_Comm_accept(port_name, info, 0, MPI_COMM_WORLD, &intercomm);
/* receive comm size from the sender */
MPI_Recv(&data, 1, MPI_UNSIGNED_LONG, 0, 1, intercomm, MPI_STATUS_IGNORE);
printf("received data: %lx\n", data);
MPI_Barrier(intercomm);
MPI_Comm_disconnect(&intercomm);
printf("client disconnected\n");
}
}
#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(int argc, char *argv[])
{
char port_name[MPI_MAX_PORT_NAME+1];
MPI_Info info;
MPI_Comm intercomm;
unsigned long data = 0x12345678;
/* initialize MPI */
MPI_Init(&argc, &argv);
MPI_Info_create(&info);
/* connect to receiver ranks - port is a string parameter */
strcpy(port_name, argv[1]);
/* connect to server - intercomm is the remote communicator */
MPI_Comm_connect(port_name, info, 0, MPI_COMM_WORLD, &intercomm);
printf("** connected\n");
/* send data */
MPI_Send(&data, 1, MPI_UNSIGNED_LONG, 0, 1, intercomm);
MPI_Barrier(intercomm);
/* disconnect */
MPI_Comm_disconnect(&intercomm);
MPI_Finalize();
printf("** disconnected\n");
return 0;
}