Sounds like bad news about the threading. That's probably what's hanging me as 
well. We're running clusters of multi-core smp's, our app NEEDS 
multi-threading. It'd be nice to get an "official" reply on this from someone 
on the dev team.
-David

-----Original Message-----
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf 
Of Mike Houston
Sent: Tuesday, March 13, 2007 5:52 AM
To: Open MPI Users
Subject: [OMPI users] Fun with threading

At least with 1.1.4, I'm having a heck of a time with enabling 
multi-threading.  Configuring with --with-threads=posix 
--enable-mpi-threads --enable-progress-threads leads to mpirun just 
hanging, even when not launching MPI apps, i.e. mpirun -np 1 hostname, 
and I can't crtl-c to kill it, I have to kill -9 it.  Removing progress 
threads support results in the same behavior.  Removing 
--enable-mpi-threads gets mpirun working again, but not the thread 
protection I need.

What is the status for multi thread support?  It looks like it's still 
largely untested from my reading of the mailing lists.  We actually have 
an application that would be much easier to deal with if we could have 
two threads in a process both using MPI.  Funneling everything through a 
single processor creates a locking nightmare, and generally means we 
will be forced to spin checking a IRecv and the status of a data 
structure instead of having one thread happily sitting on a blocking 
receive and the other watching the data structure, basically pissing 
away a processor that we could be using to do something useful.  (We are 
basically doing a simplified version of DSM and we need to respond to 
remote data requests).

At the moment, it seems that when running without threading support 
enabled, if we only post a receive on a single thread, things are mostly 
happy, except if one thread in process sends to the other thread in the 
same process who has posted a receive.  Under TCP, the send fails with:

*** An error occurred in MPI_Send
*** on communicator MPI_COMM_WORLD
*** MPI_ERR_INTERN: internal error
*** MPI_ERRORS_ARE_FATAL (goodbye)
[0,0,0]-[0,1,0] mca_oob_tcp_msg_recv: readv failed with errno=104

SM has undefined results.

Obviously I'm playing fast and loose, which is why I'm attempting to get 
threading support to work to see if it solve the headaches.  If you 
really want to have some fun, have a posted MPI_Recv on one thread and 
issue an MPI_Barrier on the other (with SM):

Signal:11 info.si_errno:0(Success) si_code:1(SEGV_MAPERR)
Failing at addr:0x1c
[0] func:/usr/lib/libopal.so.0 [0xc030f4]
[1] func:/lib/tls/libpthread.so.0 [0x46f93890]
[2] 
func:/usr/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_recv_frag_match+0xb08) 
[0x14ec38]
[3] 
func:/usr/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_recv_frag_callback+0x2f9) 
[0x14f7e9]
[4] 
func:/usr/lib/openmpi/mca_btl_sm.so(mca_btl_sm_component_progress+0xa87) 
[0x806c07]
[5] func:/usr/lib/openmpi/mca_bml_r2.so(mca_bml_r2_progress+0x39) [0x510c69]
[6] func:/usr/lib/libopal.so.0(opal_progress+0x69) [0xbecc39]
[7] func:/usr/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_send+0x785) [0x14d675]
[8] 
func:/usr/lib/openmpi/mca_coll_tuned.so(ompi_coll_tuned_sendrecv_actual_localcompleted+0x8c)
 
[0x5cc3fc]
[9] 
func:/usr/lib/openmpi/mca_coll_tuned.so(ompi_coll_tuned_barrier_intra_two_procs+0x76)
 
[0x5ceef6]
[10] 
func:/usr/lib/openmpi/mca_coll_tuned.so(ompi_coll_tuned_barrier_intra_dec_fixed+0x38)
 
[0x5cc638]
[11] func:/usr/lib/libmpi.so.0(PMPI_Barrier+0xe9) [0x29a1b9]

-Mike
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to