THREAD_MULTIPLE support does not work in the 1.2 series. Try turning it off.

On Oct 30, 2007, at 12:17 AM, Neeraj Chourasia wrote:

Hi folks,

I have been seeing some nasty behaviour in MPI_Send/Recv with large dataset(8 MB), when used with OpenMP and Openmpi together with IB Interconnect. Attached is a program.

The code first calls MPI_Init_thread() followed by openmp thread creation API. The program works fine, if we do single side comm unication [Thread 0 of process 0 sending some data to any thread of process 1], but it hangs if both side tries to send some data (8 MB) using IB Interconnect

Interesting to note that program works fine, if we send short data(1 MB or below).

        I see this with

openmpi-1.2 or openmpi-1.2.4 (compiled with --enable-mpi- threads)
        ofed 1.2
        2.6.9-42.4sp.XCsmp
        icc (Intel Compiler)

        compiled as
                mpicc -O3 -openmp temp.c
        run as
                mpirun -np 2 -hostfile nodelist a.out

        The error i am getting is
---------------------------------------------------------------------- ---------------------------------------------------------------------- ----------------------

[0,1,1][btl_openib_component.c: 1199:btl_openib_component_progress] from n129 to: n115 error polling LP CQ with status LOCAL PROTOCOL ERROR status number 4 for wr_id 6391728 opcode 0 [0,1,1][btl_openib_component.c:1199:btl_openib_component_progress] from n129 to: n115 error polling LP CQ with status WORK REQUEST FLUSHED ERROR status number 5 for wr_id 7058304 opcode 128 [0,1,0][btl_openib_component.c:1199:btl_openib_component_progress] from n115 to: n129 [0,1,0][btl_openib_component.c: 1199:btl_openib_component_progress] from n115 to: n129 error polling LP CQ with status WORK REQUEST FLUSHED ERROR status number 5 for wr_id 6854256 opcode 128 error polling LP CQ with status LOCAL LENGTH ERROR status number 1 for wr_id 6920112 opcode 0

---------------------------------------------------------------------- ---------------------------------------------------------------------- -------------------


        Anyone else seeing similar?  Any ideas for workarounds?
As a point of reference, program works fine, if we force openmpi to select TCP interconnect using --mca btl tcp,self.

-Neeraj

<temp.c>
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


--
Jeff Squyres
Cisco Systems

Reply via email to