Hi folks, I have been seeing some nasty behaviour in MPI_Send/Recv with large dataset(8 MB), when used with OpenMP and Openmpi together with IB Interconnect. Attached is a program. The code first calls MPI_Init_thread() followed by openmp thread creation API. The program works fine, if we do single side comm unication [Thread 0 of process 0 sending some data to any thread of process 1], but it hangs if both side tries to send some data (8 MB) using IB Interconnect Interesting to note that program works fine, if we send short data(1 MB or below). I see this with openmpi-1.2 or openmpi-1.2.4 (compiled with --enable-mpi-threads) ofed 1.2 2.6.9-42.4sp.XCsmp icc (Intel Compiler) compiled as mpicc -O3 -openmp temp.c run as mpirun -np 2 -hostfile nodelist a.out The error i am getting is ------------------------------------------------------------------------------------------------------------------------------------------------------------------ [0,1,1][btl_openib_component.c:1199:btl_openib_component_progress] from n129 to: n115 error polling LP CQ with status LOCAL PROTOCOL ERROR status number 4 for wr_id 6391728 opcode 0[0,1,1][btl_openib_component.c:1199:btl_openib_component_progress] from n129 to: n115 error polling LP CQ with status WORK REQUEST FLUSHED ERROR status number 5 for wr_id 7058304 opcode 128[0,1,0][ btl_openib_component.c:1199:btl_openib_component_progress] from n115 to: n129 [0,1,0][btl_openib_component.c:1199:btl_openib_component_progress] from n115 to: n129 error polling LP CQ with status WORK REQUEST FLUSHED ERROR status number 5 for wr_id 6854256 opcode 128error polling LP CQ with status LOCAL LENGTH ERROR status number 1 for wr_id 6920112 opcode 0 --------------------------------------------------------------------------------------------------------------------------------------------------------------- Anyone else seeing similar? Any ideas for workarounds? As a point of reference, program works fine, if we force openmpi to select TCP interconnect using --mca btl tcp,self.-Neeraj
#include<stdio.h> #include<mpi.h> #include<omp.h> #include<math.h> #include <stdlib.h> #include "time.h" #include <sys/time.h>
#define MAX 1000000 int main(int argc, char *argv[]) { int required = MPI_THREAD_MULTIPLE; int provided; int rank; int size; int id; int flag; MPI_Status status; double *buff1, *buff2; MPI_Init_thread(&argc, &argv, required, &provided); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Comm_size(MPI_COMM_WORLD, &size); buff1 = (double *)malloc(sizeof(double)*MAX); buff2 = (double *)malloc(sizeof(double)*MAX); omp_set_num_threads(2); #pragma omp parallel private(id) { id = omp_get_thread_num(); if(rank == 0) { if(id == 0) MPI_Send(buff1, MAX ,MPI_DOUBLE, 1, rank, MPI_COMM_WORLD); else MPI_Recv(buff2, MAX, MPI_DOUBLE, 1, 1234, MPI_COMM_WORLD, &status); } if(rank == 1) { if(id == 0) MPI_Recv(buff1, MAX, MPI_DOUBLE, 0, 0, MPI_COMM_WORLD, &status); else MPI_Send(buff2, MAX ,MPI_DOUBLE, 0, 1234, MPI_COMM_WORLD); } } printf("rank = %d %d \n", rank, provided); free(buff1); free(buff2); MPI_Barrier(MPI_COMM_WORLD); MPI_Finalize(); }