Thanks for your reply, but the program is running on TCP
interconnect with same datasize and also on IB with small datasize say 1MB. So
i dont think problem is in OpenMPI, it has to do something with IB logic, which
probably doesnt work well with threads.I also tried the program with
MPI_THREAD_SERIALIZED, but in vain. When is the version 1.3 scheduled to
be released? Would it fix such issues?Correct me, if i am wrong-NeerajOn Wed,
31 Oct 2007 05:31:32 -0700 Open MPI Users wrote THREAD_MULTIPLE support does
not work in the 1.2 series. Try turning it off. On Oct 30, 2007, at
12:17 AM, Neeraj Chourasia wrote: > Hi folks, > > I have
been seeing some nasty behaviour in MPI_Send/Recv > with large dataset(8
MB), when used with OpenMP and Openmpi > together with IB Interconnect.
Attached is a program. > > The code first calls
MPI_Init_thread() followed by openmp > thread creation API. The program
works fine, if we do single side > comm unication [Thread 0 of process 0
sending some data to any > thread of process 1], but it hangs if both
side tries to send some > data (8 MB) using IB Interconnect > >
Interesting to note that program works fine, if we send > short
data(1 MB or below). > > I see this with > >
openmpi-1.2 or openmpi-1.2.4 (compiled with --enable-mpi- > threads) >
ofed 1.2 > 2.6.9-42.4sp.XCsmp > icc (Intel
Compiler) > > compiled as > mpicc -O3
-openmp temp.c > run as > mpirun -np 2
-hostfile nodelist a.out > > The error i am getting is >
>
---------------------------------------------------------------------- >
---------------------------------------------------------------------- >
---------------------- > > [0,1,1][btl_openib_component.c:
> 1199:btl_openib_component_progress] fr
om n129 to: n115 error > polling LP CQ with status LOCAL PROTOCOL ERROR
status number 4 for > wr_id 6391728 opcode 0 >
[0,1,1][btl_openib_component.c:1199:btl_openib_component_progress] > from
n129 to: n115 error polling LP CQ with status WORK REQUEST > FLUSHED
ERROR status number 5 for wr_id 7058304 opcode 128 >
[0,1,0][btl_openib_component.c:1199:btl_openib_component_progress] > from
n115 to: n129 [0,1,0][btl_openib_component.c: >
1199:btl_openib_component_progress] from n115 to: n129 error > polling LP
CQ with status WORK REQUEST FLUSHED ERROR status number > 5 for wr_id
6854256 opcode 128 > error polling LP CQ with status LOCAL LENGTH ERROR
status number 1 > for wr_id 6920112 opcode 0 > > >
---------------------------------------------------------------------- >
---------------------------------------------------------------------- >
------------------- > > > Anyone else seeing similar? Any
ideas for workarounds? > As a point of reference, program works
fine, if we force > openmpi to select TCP interconnect using --mca btl
tcp,self. > > -Neeraj > > >
_______________________________________________ > users mailing list >
us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
-- Jeff Squyres Cisco Systems
_______________________________________________ users mailing list
us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users