On 2013-11-25, at 9:02 PM, Ralph Castain <rhc.open...@gmail.com> wrote:

> On Nov 25, 2013, at 5:04 PM, Pierre Jolivet <joli...@ann.jussieu.fr> wrote:
> 
>> 
>> On Nov 24, 2013, at 3:03 PM, Jed Brown <jedbr...@mcs.anl.gov> wrote:
>> 
>>> Ralph Castain <r...@open-mpi.org> writes:
>>> 
>>>> Given that we have no idea what Homebrew uses, I don't know how we
>>>> could clarify/respond.
>>> 
>> 
>> Ralph, it is pretty easy to know what Homebrew uses, c.f. 
>> https://github.com/mxcl/homebrew/blob/master/Library/Formula/open-mpi.rb 
>> (sorry if you meant something else).
> 
> Might be a surprise, but I don't track all these guys :-)
> 
> Homebrew is new to me
> 
>> 
>>> Pierre provided a link to MacPorts saying that all of the following
>>> options were needed to properly enable threads.
>>> 
>>> --enable-event-thread-support --enable-opal-multi-threads 
>>> --enable-orte-progress-threads --enable-mpi-thread-multiple
>>> 
>>> If that is indeed the case, and if passing some subset of these options
>>> results in deadlock, it's not exactly user-friendly.
>>> 
>>> Maybe --enable-mpi-thread-multiple is enough, in which case MacPorts is
>>> doing something needlessly complicated and Pierre's link was a red
>>> herring?
>> 
>> That is very likely, though on the other hand, Homebrew is doing something 
>> pretty straightforward. I just wanted a quick and easy fix back when I had 
>> the same hanging issue, but there should be a better explanation if 
>> --enable-mpi-thread-multiple is indeed enough.
> 
> It is enough - we set all required things internally

Is that for sure? My original message originates from a hang in the PETSc tests 
and I get quite different results depending on whether I compile OpenMPI with 
--enable-mpi-thread-multiple only or not.

I recompiled PETSc with debugging enabled against OpenMPI built with the 
"correct" flags mentioned by Pierre, and this the stack trace I get:

$ mpirun -n 2 xterm -e gdb ./ex5

        ^C
        Program received signal SIGINT, Interrupt.
        0x00007fff991160fa in __psynch_cvwait ()
           from /usr/lib/system/libsystem_kernel.dylib
        (gdb) where
        #0  0x00007fff991160fa in __psynch_cvwait ()
           from /usr/lib/system/libsystem_kernel.dylib
        #1  0x00007fff98d6ffb9 in ?? () from /usr/lib/system/libsystem_c.dylib


        ^C
        Program received signal SIGINT, Interrupt.
        0x00007fff991160fa in __psynch_cvwait ()
           from /usr/lib/system/libsystem_kernel.dylib
        (gdb) where
        #0  0x00007fff991160fa in __psynch_cvwait ()
           from /usr/lib/system/libsystem_kernel.dylib
        #1  0x00007fff98d6ffb9 in ?? () from /usr/lib/system/libsystem_c.dylib


If I recompile PETSc against OpenMPI built with --enable-mpi-thread-multiple 
only (leaving out the other flags, which Pierre suggested is wrong), I get the 
following traces:

        ^C
        Program received signal SIGINT, Interrupt.
        0x00007fff991160fa in __psynch_cvwait ()
           from /usr/lib/system/libsystem_kernel.dylib
        (gdb) where
        #0  0x00007fff991160fa in __psynch_cvwait ()
           from /usr/lib/system/libsystem_kernel.dylib
        #1  0x00007fff98d6ffb9 in ?? () from /usr/lib/system/libsystem_c.dylib


        ^C
        Program received signal SIGINT, Interrupt.
        0x0000000101edca28 in mca_common_sm_init ()
           from /usr/local/Cellar/open-mpi/1.7.3/lib/libmca_common_sm.4.dylib
        (gdb) where
        #0  0x0000000101edca28 in mca_common_sm_init ()
           from /usr/local/Cellar/open-mpi/1.7.3/lib/libmca_common_sm.4.dylib
        #1  0x0000000101ed8a38 in mca_mpool_sm_init ()
           from /usr/local/Cellar/open-mpi/1.7.3/lib/openmpi/mca_mpool_sm.so
        #2  0x0000000101c383fa in mca_mpool_base_module_create ()
           from /usr/local/lib/libmpi.1.dylib
        #3  0x0000000102933b41 in mca_btl_sm_add_procs ()
           from /usr/local/Cellar/open-mpi/1.7.3/lib/openmpi/mca_btl_sm.so
        #4  0x0000000102929dfb in mca_bml_r2_add_procs ()
           from /usr/local/Cellar/open-mpi/1.7.3/lib/openmpi/mca_bml_r2.so
        #5  0x000000010290a59c in mca_pml_ob1_add_procs ()
           from /usr/local/Cellar/open-mpi/1.7.3/lib/openmpi/mca_pml_ob1.so
        #6  0x0000000101bd859b in ompi_mpi_init () from 
/usr/local/lib/libmpi.1.dylib
        #7  0x0000000101bf24da in MPI_Init_thread () from 
/usr/local/lib/libmpi.1.dylib
        #8  0x00000001000724db in PetscInitialize (argc=0x7fff5fbfed48, 
            args=0x7fff5fbfed40, file=0x0, 
            help=0x1000061c0 "Bratu nonlinear PDE in 2d.\nWe solve the  Bratu 
(SFI - soid fuel ignition) problem in a 2D rectangular\ndomain, using 
distributed arrays(DMDAs) to partition the parallel grid.\nThe command line 
options"...)
            at /tmp/petsc-3.4.3/src/sys/objects/pinit.c:675
        #9  0x0000000100000d8c in main ()


Line 675 of pinit.c is

        ierr = 
MPI_Init_thread(argc,args,MPI_THREAD_FUNNELED,&provided);CHKERRQ(ierr);


Dominique



Reply via email to