Hi, What are the advantages with progress-threads feature?
Thanks, Sangamesh On Fri, Jan 8, 2010 at 10:13 PM, Ralph Castain <r...@open-mpi.org> wrote: > Yeah, the system doesn't currently support enable-progress-threads. It is a > two-fold problem: ORTE won't work that way, and some parts of the MPI layer > won't either. > > I am currently working on fixing ORTE so it will work with progress threads > enabled. I believe (but can't confirm) that the TCP BTL will also work with > that feature, but I have heard that the other BTL's won't (again, can't > confirm). > > I'll send out a note when ORTE is okay, but that won't be included in a > release for awhile. > > On Jan 8, 2010, at 9:38 AM, Dong Li wrote: > > > Hi, guys. > > My application got stuck when I run an application with Open MPI 1.4 > > with progress thead enabled. > > > > The OpenMPI is configured and compiled with the following options. > > ./configure --with-openib=/usr --enable-trace --enable-debug > > --enable-peruse --enable-progress-threads > > > > Then I started the application with two MPI processes, but it looks > > like there is some problem with orte and the mpiexec just stuck there > > and never run the application. > > I used gdb to attach to the mpiexec to find out where the program got > > stuck. The backtrace information is shown in the following for the two > > MPI progresses (i.e. the rank 0 and the rank 1). It looks to me that > > the problem happened in the rank 0 when it tries to do some atomic add > > operation. Note that my processor is Intel Xeon CPU E5462, but the > > open mpi tried to use some AMD64 instructions to conduct atomic add > > operations. Is this a bug or something? > > > > Any comment? Thank you. > > > > -Dong > > > > > > > *********************************************************************************************************************************************** > > The following is for the rank 0. > > (gdb) bt > > #0 0x00007fbdd1c93264 in opal_atomic_cmpset_32 (addr=0x7fbdd1eede24, > > oldval=1, newval=0) at ../opal/include/opal/sys/amd64/atomic.h:94 > > #1 0x00007fbdd1c93348 in opal_atomic_add_xx (addr=0x7fbdd1eede24, > > value=1, length=4) at ../opal/include/opal/sys/atomic_impl.h:243 > > #2 0x00007fbdd1c932ad in opal_progress () at runtime/opal_progress.c:171 > > #3 0x00007fbdd1f5c9ad in orte_plm_base_daemon_callback > > (num_daemons=1) at base/plm_base_launch_support.c:459 > > #4 0x00007fbdd0a5579d in orte_plm_rsh_launch (jdata=0x60f070) at > > plm_rsh_module.c:1221 > > #5 0x0000000000403821 in orterun (argc=15, argv=0x7fffda18a498) at > > orterun.c:748 > > #6 0x0000000000402dc7 in main (argc=15, argv=0x7fffda18a498) at > main.c:13 > > > ************************************************************************************************************************************************ > > The following is for the rank 1. > > #0 0x0000003c4c20b309 in pthread_cond_wait@@GLIBC_2.3.2 () from > > /lib64/libpthread.so.0 > > #1 0x00007f6f8b04ba56 in opal_condition_wait (c=0x656ce0, m=0x656c88) > > at ../../../../opal/threads/condition.h:78 > > #2 0x00007f6f8b04b8b7 in orte_rml_oob_send (peer=0x7f6f8c578978, > > iov=0x7fff945798d0, count=1, tag=10, flags=16) at rml_oob_send.c:153 > > #3 0x00007f6f8b04c197 in orte_rml_oob_send_buffer > > (peer=0x7f6f8c578978, buffer=0x6563b0, tag=10, flags=0) at > > rml_oob_send.c:269 > > #4 0x00007f6f8c32fe24 in orte_daemon (argc=28, argv=0x7fff9457abd8) > > at orted/orted_main.c:610 > > #5 0x0000000000400917 in main (argc=28, argv=0x7fff9457abd8) at > orted.c:62 > > _______________________________________________ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >