Thanks for your reply,    but the program is running on TCP 
interconnect with same datasize and also on IB with small datasize say 1MB. So 
i dont think problem is in OpenMPI, it has to do something with IB logic, which 
probably doesnt work well with threads.I also tried the program with 
MPI_THREAD_SERIALIZED, but in vain. When is the version 1.3 scheduled to 
be released? Would it fix such issues?Correct me, if i am wrong-NeerajOn Wed, 
31 Oct 2007 05:31:32 -0700 Open MPI Users  wrote  THREAD_MULTIPLE support does 
not work in the 1.2 series.  Try turning    it off.      On Oct 30, 2007, at 
12:17 AM, Neeraj Chourasia wrote:    > Hi folks,  >  >         I have 
been seeing some nasty behaviour in MPI_Send/Recv    > with large dataset(8 
MB), when used with OpenMP and Openmpi    > together with IB Interconnect. 
Attached is a program.  >  >        The code first calls 
MPI_Init_thread() followed by openmp    > thread creation API. The program 
works fine, if we do single side  > comm unication [Thread 0 of process 0 
sending some data to any    > thread of process 1], but it hangs if both 
side tries to send some    > data (8 MB) using IB Interconnect  >  >   
      Interesting to note that program works fine, if we send    > short 
data(1 MB or below).  >  >         I see this with  >  >         
openmpi-1.2 or openmpi-1.2.4 (compiled with --enable-mpi-   > threads)  > 
        ofed 1.2  >         2.6.9-42.4sp.XCsmp  >         icc (Intel 
Compiler)  >  >         compiled as  >                 mpicc -O3 
-openmp temp.c  >         run as  >                 mpirun -np 2 
-hostfile nodelist a.out  >  >         The error i am getting is  >    
        > 
----------------------------------------------------------------------   > 
----------------------------------------------------------------------   > 
----------------------  >  >         [0,1,1][btl_openib_component.c:   
> 1199:btl_openib_component_progress] fr
om n129 to: n115 error    > polling LP CQ with status LOCAL PROTOCOL ERROR 
status number 4 for    > wr_id 6391728 opcode 0  > 
[0,1,1][btl_openib_component.c:1199:btl_openib_component_progress]    > from 
n129 to: n115 error polling LP CQ with status WORK REQUEST    > FLUSHED 
ERROR status number 5 for wr_id 7058304 opcode 128  > 
[0,1,0][btl_openib_component.c:1199:btl_openib_component_progress]    > from 
n115 to: n129 [0,1,0][btl_openib_component.c:   > 
1199:btl_openib_component_progress] from n115 to: n129 error    > polling LP 
CQ with status WORK REQUEST FLUSHED ERROR status number    > 5 for wr_id 
6854256 opcode 128  > error polling LP CQ with status LOCAL LENGTH ERROR 
status number 1    > for wr_id 6920112 opcode 0  >  >            > 
----------------------------------------------------------------------   > 
----------------------------------------------------------------------   > 
-------------------  >  >  >         Anyone else seeing similar?  Any 
ideas for workarounds?  >         As a point of reference, program works 
fine, if we force    > openmpi to select TCP interconnect using --mca btl 
tcp,self.  >  > -Neeraj  >  >   > 
_______________________________________________  > users mailing list  > 
us...@open-mpi.org  > http://www.open-mpi.org/mailman/listinfo.cgi/users     
 --   Jeff Squyres  Cisco Systems    
_______________________________________________  users mailing list  
us...@open-mpi.org  http://www.open-mpi.org/mailman/listinfo.cgi/users  

Reply via email to