Sounds like bad news about the threading. That's probably what's hanging me as
well. We're running clusters of multi-core smp's, our app NEEDS
multi-threading. It'd be nice to get an "official" reply on this from someone
on the dev team.
-David
-Original Message-
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf
Of Mike Houston
Sent: Tuesday, March 13, 2007 5:52 AM
To: Open MPI Users
Subject: [OMPI users] Fun with threading
At least with 1.1.4, I'm having a heck of a time with enabling
multi-threading. Configuring with --with-threads=posix
--enable-mpi-threads --enable-progress-threads leads to mpirun just
hanging, even when not launching MPI apps, i.e. mpirun -np 1 hostname,
and I can't crtl-c to kill it, I have to kill -9 it. Removing progress
threads support results in the same behavior. Removing
--enable-mpi-threads gets mpirun working again, but not the thread
protection I need.
What is the status for multi thread support? It looks like it's still
largely untested from my reading of the mailing lists. We actually have
an application that would be much easier to deal with if we could have
two threads in a process both using MPI. Funneling everything through a
single processor creates a locking nightmare, and generally means we
will be forced to spin checking a IRecv and the status of a data
structure instead of having one thread happily sitting on a blocking
receive and the other watching the data structure, basically pissing
away a processor that we could be using to do something useful. (We are
basically doing a simplified version of DSM and we need to respond to
remote data requests).
At the moment, it seems that when running without threading support
enabled, if we only post a receive on a single thread, things are mostly
happy, except if one thread in process sends to the other thread in the
same process who has posted a receive. Under TCP, the send fails with:
*** An error occurred in MPI_Send
*** on communicator MPI_COMM_WORLD
*** MPI_ERR_INTERN: internal error
*** MPI_ERRORS_ARE_FATAL (goodbye)
[0,0,0]-[0,1,0] mca_oob_tcp_msg_recv: readv failed with errno=104
SM has undefined results.
Obviously I'm playing fast and loose, which is why I'm attempting to get
threading support to work to see if it solve the headaches. If you
really want to have some fun, have a posted MPI_Recv on one thread and
issue an MPI_Barrier on the other (with SM):
Signal:11 info.si_errno:0(Success) si_code:1(SEGV_MAPERR)
Failing at addr:0x1c
[0] func:/usr/lib/libopal.so.0 [0xc030f4]
[1] func:/lib/tls/libpthread.so.0 [0x46f93890]
[2]
func:/usr/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_recv_frag_match+0xb08)
[0x14ec38]
[3]
func:/usr/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_recv_frag_callback+0x2f9)
[0x14f7e9]
[4]
func:/usr/lib/openmpi/mca_btl_sm.so(mca_btl_sm_component_progress+0xa87)
[0x806c07]
[5] func:/usr/lib/openmpi/mca_bml_r2.so(mca_bml_r2_progress+0x39) [0x510c69]
[6] func:/usr/lib/libopal.so.0(opal_progress+0x69) [0xbecc39]
[7] func:/usr/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_send+0x785) [0x14d675]
[8]
func:/usr/lib/openmpi/mca_coll_tuned.so(ompi_coll_tuned_sendrecv_actual_localcompleted+0x8c)
[0x5cc3fc]
[9]
func:/usr/lib/openmpi/mca_coll_tuned.so(ompi_coll_tuned_barrier_intra_two_procs+0x76)
[0x5ceef6]
[10]
func:/usr/lib/openmpi/mca_coll_tuned.so(ompi_coll_tuned_barrier_intra_dec_fixed+0x38)
[0x5cc638]
[11] func:/usr/lib/libmpi.so.0(PMPI_Barrier+0xe9) [0x29a1b9]
-Mike
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users