[OMPI users] mutex deadlock in btl tcp
Dear Open MPI users list, From time to time, I experience a mutex deadlock in Open-MPI 1.1.2. The stack trace is available at the end of the mail. The deadlock seems to be caused by lines 118 & 119 of the ompi/mca/btl/tcp/btl_tcp.c file, in function mca_btl_tcp_add_procs: OBJ_RELEASE(tcp_endpoint); OPAL_THREAD_UNLOCK(&tcp_proc->proc_lock); (of course, I did not check whether line numbers have changed since 1.1.2.) Indeed, releasing tcp_endpoint causes a call to mca_btl_tcp_proc_remove that attempts to acquire the mutex tcp_proc->proc_lock, which is already held by the thread (OBJ_THREAD_LOCK(&tcp_proc->proc_lock) at line 103 of the ompi/mca/btl/tcp/btl_tcp.c file). Switching the two lines above (ie releasing the mutex before destructing tcp_endpoint) seems to be sufficient to fix the deadlock. Maybe should the changes done in the mca_btl_tcp_proc_insert function be reverted rather than releasing the mutex before tcp_endpoint? As far as I looked, the problem seems to still appear in the trunk revision 13359. Second point. Is there any reason why MPI_Comm_spawn is restricted to execute the new process(es) only on hosts listed in either the --host option or in the hostfile? Or did I miss something? Best regards, Jeremy -- stack trace as dumped by open-mpi (gdb version follows): opal_mutex_lock(): Resource deadlock avoided Signal:6 info.si_errno:0(Success) si_code:-6() [0] func:/home1/jbuisson/soft/openmpi-1.1.2/lib/libopal.so.0 [0x8addeb] [1] func:/lib/tls/libpthread.so.0 [0x176e40] [2] func:/lib/tls/libc.so.6(abort+0x1d5) [0xa294e5] [3] func:/home1/jbuisson/soft/openmpi-1.1.2/lib/openmpi/mca_btl_tcp.so [0x65f8a3] [4] func:/home1/jbuisson/soft/openmpi-1.1.2/lib/openmpi/mca_btl_tcp.so(mca_btl_tcp_proc_remove+0x2a) [0x65fff0] [5] func:/home1/jbuisson/soft/openmpi-1.1.2/lib/openmpi/mca_btl_tcp.so [0x65cb24] [6] func:/home1/jbuisson/soft/openmpi-1.1.2/lib/openmpi/mca_btl_tcp.so [0x659465] [7] func:/home1/jbuisson/soft/openmpi-1.1.2/lib/openmpi/mca_btl_tcp.so(mca_btl_tcp_add_procs+0x10f) [0x65927b] [8] func:/home1/jbuisson/soft/openmpi-1.1.2/lib/openmpi/mca_bml_r2.so(mca_bml_r2_add_procs+0x1bb) [0x628023] [9] func:/home1/jbuisson/soft/openmpi-1.1.2/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_add_procs+0xd6) [0x61650b] [10] func:/home1/jbuisson/soft/openmpi-1.1.2/lib/libmpi.so.0(ompi_comm_get_rport+0x1f8) [0xb82303] [11] func:/home1/jbuisson/soft/openmpi-1.1.2/lib/libmpi.so.0(ompi_comm_connect_accept+0xbb) [0xb81b43] [12] func:/home1/jbuisson/soft/openmpi-1.1.2/lib/libmpi.so.0(PMPI_Comm_spawn+0x3de) [0xbb671a] [13] func:/home1/jbuisson/target/bin/mpi-spawner(__gxx_personality_v0+0x3d2) [0x804bb8e] [14] func:/home1/jbuisson/target/bin/mpi-spawner [0x804bdff] [15] func:/home1/jbuisson/target/bin/mpi-spawner [0x804bfd4] [16] func:/lib/tls/libc.so.6(__libc_start_main+0xda) [0xa1578a] [17] func:/home1/jbuisson/target/bin/mpi-spawner(__gxx_personality_v0+0x75) [0x804b831] *** End of error message *** Same stack, dumped by gdb: #0 0x00176357 in __pause_nocancel () from /lib/tls/libpthread.so.0 #1 0x008ade9b in opal_show_stackframe (signo=6, info=0xbfff9290, p=0xbfff9310) at stacktrace.c:306 #2 #3 0x00a27cdf in raise () from /lib/tls/libc.so.6 #4 0x00a294e5 in abort () from /lib/tls/libc.so.6 #5 0x0065f8a3 in opal_mutex_lock (m=0x8ff8250) at ../../../../opal/threads/mutex_unix.h:104 #6 0x0065fff0 in mca_btl_tcp_proc_remove (btl_proc=0x8ff8220, btl_endpoint=0x900eba0) at btl_tcp_proc.c:296 #7 0x0065cb24 in mca_btl_tcp_endpoint_destruct (endpoint=0x900eba0) at btl_tcp_endpoint.c:99 #8 0x00659465 in opal_obj_run_destructors (object=0x900eba0) at ../../../../opal/class/opal_object.h:405 #9 0x0065927b in mca_btl_tcp_add_procs (btl=0x8e57c30, nprocs=1, ompi_procs=0x8ff7ac8, peers=0x8ff7ad8, reachable=0xbfff98e4) at btl_tcp.c:118 #10 0x00628023 in mca_bml_r2_add_procs (nprocs=1, procs=0x8ff7ac8, bml_endpoints=0x8ff60b8, reachable=0xbfff98e4) at bml_r2.c:231 #11 0x0061650b in mca_pml_ob1_add_procs (procs=0xbfff9930, nprocs=1) at pml_ob1.c:133 #12 0x00b82303 in ompi_comm_get_rport (port=0x0, send_first=0, proc=0x8e51c70, tag=2000) at communicator/comm_dyn.c:305 #13 0x00b81b43 in ompi_comm_connect_accept (comm=0x8ff8ce0, root=0, port=0x0, send_first=0, newcomm=0xbfff9a38, tag=2000) at communicator/comm_dyn.c:85 #14 0x00bb671a in PMPI_Comm_spawn (command=0x8ff88f0 "/home1/jbuisson/target/bin/sample-npb-ft-pp", argv=0xbfff9b40, maxprocs=1, info=0x8ff73e0, root=0, comm=0x8ff8ce0, intercomm=0xbfff9aa4, array_of_errcodes=0x0) at pcomm_spawn.c:110 signature.asc Description: OpenPGP digital signature
Re: [OMPI users] Open MPI error when using MPI_Comm_spawn
Ralph Castain a écrit : > The runtime underneath Open MPI (called OpenRTE) will not allow you to spawn > processes on nodes outside of your allocation. This is for several reasons, > but primarily because (a) we only know about the nodes that were allocated, > so we have no idea how to spawn a process anywhere else, and (b) most > resource managers wouldn't let us do it anyway. > > I gather you have some node that you know about and have hard-coded into > your application? How do you know the name of the node if it isn't in your > allocation?? Because I can give that names to OpenMPI (or OpenRTE, or whatever). I also would like to do the same, and I don't want OpenMPI to restrict to what it thinks to be the allocation, while I'm sure to know better than it what I am doing. The concept of nodes being in allocations fixed at launch-time is really rigid; and it prevents the application (or whatever else) to modify the allocation at runtime, which may be quite nice. Here is an ugly patch I've quickly done for my own use, which changes the round-robin rmaps such that is first allocates the hosts to the rmgr, as a copy&paste of some code in the dash_host ras component. It's far from being bugfree, but it can be a startpoint to hack. Jeremy > Ralph > > > On 4/2/07 10:05 AM, "Prakash Velayutham" > wrote: > >> Hello, >> >> I have built Open MPI (1.2) with run-time environment enabled for Torque >> (2.1.6) resource manager. Initially I am requesting 4 nodes (1 CPU each) >> from Torque. The from inside of my MPI code I am trying to spawn more >> processes to nodes outside of Torque-assigned nodes using >> MPI_Comm_spawn, but this is failing with an error below: >> >> [wins04:13564] *** An error occurred in MPI_Comm_spawn >> [wins04:13564] *** on communicator MPI_COMM_WORLD >> [wins04:13564] *** MPI_ERR_ARG: invalid argument of some other kind >> [wins04:13564] *** MPI_ERRORS_ARE_FATAL (goodbye) >> mpirun noticed that job rank 1 with PID 15070 on node wins03 exited on >> signal 15 (Terminated). >> 2 additional processes aborted (not shown) >> >> # >> >> MPI_Info info; >> MPI_Comm comm, *intercomm; >> ... >> ... >> char *key, *value; >> key = "host"; >> value = "wins08"; >> rc1 = MPI_Info_create(&info); >> rc1 = MPI_Info_set(info, key, value); >> rc1 = MPI_Comm_spawn(slave,MPI_ARGV_NULL, 1, info, 0, >> MPI_COMM_WORLD, intercomm, arr); >> ... >> } >> >> ### >> >> Would this work as it is or is something wrong with my assumption? Is >> OpenRTE stopping me from spawning processes outside of the initially >> allocated nodes through Torque? >> >> Thanks, >> Prakash >> >> ___ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users diff -ru openmpi-1.2/ompi/mca/btl/tcp/btl_tcp.c openmpi-1.2-custom/ompi/mca/btl/tcp/btl_tcp.c --- openmpi-1.2/ompi/mca/btl/tcp/btl_tcp.c 2006-11-09 19:53:44.0 +0100 +++ openmpi-1.2-custom/ompi/mca/btl/tcp/btl_tcp.c 2007-03-28 14:02:10.0 +0200 @@ -117,8 +117,8 @@ tcp_endpoint->endpoint_btl = tcp_btl; rc = mca_btl_tcp_proc_insert(tcp_proc, tcp_endpoint); if(rc != OMPI_SUCCESS) { -OBJ_RELEASE(tcp_endpoint); OPAL_THREAD_UNLOCK(&tcp_proc->proc_lock); +OBJ_RELEASE(tcp_endpoint); continue; } diff -ru openmpi-1.2/opal/threads/mutex.c openmpi-1.2-custom/opal/threads/mutex.c --- openmpi-1.2/opal/threads/mutex.c2006-11-09 19:53:32.0 +0100 +++ openmpi-1.2-custom/opal/threads/mutex.c 2007-03-28 15:59:25.0 +0200 @@ -54,6 +54,8 @@ #elif OMPI_ENABLE_DEBUG && OMPI_HAVE_PTHREAD_MUTEX_ERRORCHECK /* set type to ERRORCHECK so that we catch recursive locks */ pthread_mutexattr_settype(&attr, PTHREAD_MUTEX_ERRORCHECK); +#else +pthread_mutexattr_settype(&attr, PTHREAD_MUTEX_RECURSIVE); #endif pthread_mutex_init(&m->m_lock_pthread, &attr); diff -ru openmpi-1.2/opal/threads/mutex_unix.h openmpi-1.2-custom/opal/threads/mutex_unix.h --- openmpi-1.2/opal/threads/mutex_unix.h 2006-11-09 19:53:32.0 +0100 +++ openmpi-1.2-custom/opal/threads/mutex_unix.h2007-03-28 15:36:13.0 +0200 @@ -76,7 +76,7 @@ static inline int opal_mutex_trylock(opal_mutex_t *m) { -#if OMPI_ENABLE_DEBUG +#if 1 // OMPI_ENABLE_DEBUG int ret = pthread_mutex_trylock(&m->m_lock_pthread); if (ret == EDEADLK) { errno = ret; @@ -91,7 +91,7 @@ static inline void opal_mutex_lock(opal_mutex_t *m) { -#if OMPI_ENABLE_DEBUG +#if 1 // OMPI_ENABLE_DEBUG int ret = pthread_mutex_lock(&m->m_lock_pthread); if (ret == EDEADLK) { errno = ret; diff -ru