THREAD_MULTIPLE support does not work in the 1.2 series. Try turning
it off.
On Oct 30, 2007, at 12:17 AM, Neeraj Chourasia wrote:
Hi folks,
I have been seeing some nasty behaviour in MPI_Send/Recv
with large dataset(8 MB), when used with OpenMP and Openmpi
together with IB Interconnect. Attached is a program.
The code first calls MPI_Init_thread() followed by openmp
thread creation API. The program works fine, if we do single side
comm unication [Thread 0 of process 0 sending some data to any
thread of process 1], but it hangs if both side tries to send some
data (8 MB) using IB Interconnect
Interesting to note that program works fine, if we send
short data(1 MB or below).
I see this with
openmpi-1.2 or openmpi-1.2.4 (compiled with --enable-mpi-
threads)
ofed 1.2
2.6.9-42.4sp.XCsmp
icc (Intel Compiler)
compiled as
mpicc -O3 -openmp temp.c
run as
mpirun -np 2 -hostfile nodelist a.out
The error i am getting is
----------------------------------------------------------------------
----------------------------------------------------------------------
----------------------
[0,1,1][btl_openib_component.c:
1199:btl_openib_component_progress] from n129 to: n115 error
polling LP CQ with status LOCAL PROTOCOL ERROR status number 4 for
wr_id 6391728 opcode 0
[0,1,1][btl_openib_component.c:1199:btl_openib_component_progress]
from n129 to: n115 error polling LP CQ with status WORK REQUEST
FLUSHED ERROR status number 5 for wr_id 7058304 opcode 128
[0,1,0][btl_openib_component.c:1199:btl_openib_component_progress]
from n115 to: n129 [0,1,0][btl_openib_component.c:
1199:btl_openib_component_progress] from n115 to: n129 error
polling LP CQ with status WORK REQUEST FLUSHED ERROR status number
5 for wr_id 6854256 opcode 128
error polling LP CQ with status LOCAL LENGTH ERROR status number 1
for wr_id 6920112 opcode 0
----------------------------------------------------------------------
----------------------------------------------------------------------
-------------------
Anyone else seeing similar? Any ideas for workarounds?
As a point of reference, program works fine, if we force
openmpi to select TCP interconnect using --mca btl tcp,self.
-Neeraj
<temp.c>
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
--
Jeff Squyres
Cisco Systems