Re: [OMPI users] [openib] segfault when using openib btl

Jeff Squyres Mon, 16 Aug 2010 13:13:25 -0400

On Aug 16, 2010, at 10:05 AM, Eloi Gaudry wrote:

> I did run our application through valgrind but it couldn't find any "Invalid 
> write": there is a bunch of "Invalid read" (I'm using 1.4.2 with the 
> suppression file), "Use of uninitialized bytes" and "Conditional jump 
> depending on uninitialized bytes" in different ompi routines. Some of them 
> are located in btl_openib_component.c. I'll send you an output of valgrind 
> shortly.


A lot of them in btl_openib_* are to be expected -- OpenFabrics uses OS-bypass 
methods for some of its memory, and therefore valgrind is unaware of them (and 
therefore incorrectly marks them as uninitialized).

> Another question, you said that the callback function pointer should never be 
> 0. But can the tag be null (hdr->tag) ?

The tag is not a pointer -- it's just an integer.

> Thanks for your help,
> Eloi
> 
> 
> 
> On 16/08/2010 18:22, Jeff Squyres wrote:
>> Sorry for the delay in replying.
>> 
>> Odd; the values of the callback function pointer should never be 0.  This 
>> seems to suggest some kind of memory corruption is occurring.
>> 
>> I don't know if it's possible, because the stack trace looks like you're 
>> calling through python, but can you run this application through valgrind, 
>> or some other memory-checking debugger?
>> 
>> 
>> On Aug 10, 2010, at 7:15 AM, Eloi Gaudry wrote:
>> 
>>> Hi,
>>> 
>>> sorry, i just forgot to add the values of the function parameters:
>>> (gdb) print reg->cbdata
>>> $1 = (void *) 0x0
>>> (gdb) print openib_btl->super
>>> $2 = {btl_component = 0x2b341edd7380, btl_eager_limit = 12288, 
>>> btl_rndv_eager_limit = 12288, btl_max_send_size = 65536, 
>>> btl_rdma_pipeline_send_length = 1048576,
>>>   btl_rdma_pipeline_frag_size = 1048576, btl_min_rdma_pipeline_size = 
>>> 1060864, btl_exclusivity = 1024, btl_latency = 10, btl_bandwidth = 800, 
>>> btl_flags = 310,
>>>   btl_add_procs = 0x2b341eb8ee47<mca_btl_openib_add_procs>, btl_del_procs = 
>>> 0x2b341eb90156<mca_btl_openib_del_procs>, btl_register = 0, btl_finalize = 
>>> 0x2b341eb93186<mca_btl_openib_finalize>,
>>>   btl_alloc = 0x2b341eb90a3e<mca_btl_openib_alloc>, btl_free = 
>>> 0x2b341eb91400<mca_btl_openib_free>, btl_prepare_src = 
>>> 0x2b341eb91813<mca_btl_openib_prepare_src>,
>>>   btl_prepare_dst = 0x2b341eb91f2e<mca_btl_openib_prepare_dst>, btl_send = 
>>> 0x2b341eb94517<mca_btl_openib_send>, btl_sendi = 
>>> 0x2b341eb9340d<mca_btl_openib_sendi>,
>>>   btl_put = 0x2b341eb94660<mca_btl_openib_put>, btl_get = 
>>> 0x2b341eb94c4e<mca_btl_openib_get>, btl_dump = 
>>> 0x2b341acd45cb<mca_btl_base_dump>, btl_mpool = 0xf3f4110,
>>>   btl_register_error = 0x2b341eb90565<mca_btl_openib_register_error_cb>, 
>>> btl_ft_event = 0x2b341eb952e7<mca_btl_openib_ft_event>}
>>> (gdb) print hdr->tag
>>> $3 = 0 '\0'
>>> (gdb) print des
>>> $4 = (mca_btl_base_descriptor_t *) 0xf4a6700
>>> (gdb) print reg->cbfunc
>>> $5 = (mca_btl_base_module_recv_cb_fn_t) 0
>>> 
>>> Eloi
>>> 
>>> On Tuesday 10 August 2010 16:04:08 Eloi Gaudry wrote:
>>>> Hi,
>>>> 
>>>> Here is the output of a core file generated during a segmentation fault
>>>> observed during a collective call (using openib):
>>>> 
>>>> #0  0x0000000000000000 in ?? ()
>>>> (gdb) where
>>>> #0  0x0000000000000000 in ?? ()
>>>> #1  0x00002aedbc4e05f4 in btl_openib_handle_incoming
>>>> (openib_btl=0x1902f9b0, ep=0x1908a1c0, frag=0x190d9700, byte_len=18) at
>>>> btl_openib_component.c:2881 #2  0x00002aedbc4e25e2 in handle_wc
>>>> (device=0x19024ac0, cq=0, wc=0x7ffff279ce90) at
>>>> btl_openib_component.c:3178 #3  0x00002aedbc4e2e9d in poll_device
>>>> (device=0x19024ac0, count=2) at btl_openib_component.c:3318 #4
>>>> 0x00002aedbc4e34b8 in progress_one_device (device=0x19024ac0) at
>>>> btl_openib_component.c:3426 #5  0x00002aedbc4e3561 in
>>>> btl_openib_component_progress () at btl_openib_component.c:3451 #6
>>>> 0x00002aedb8b22ab8 in opal_progress () at runtime/opal_progress.c:207 #7
>>>> 0x00002aedb859f497 in opal_condition_wait (c=0x2aedb888ccc0,
>>>> m=0x2aedb888cd20) at ../opal/threads/condition.h:99 #8  0x00002aedb859fa31
>>>> in ompi_request_default_wait_all (count=2, requests=0x7ffff279d0e0,
>>>> statuses=0x0) at request/req_wait.c:262 #9  0x00002aedbd7559ad in
>>>> ompi_coll_tuned_allreduce_intra_recursivedoubling (sbuf=0x7ffff279d444,
>>>> rbuf=0x7ffff279d440, count=1, dtype=0x6788220, op=0x6787a20,
>>>> comm=0x19d81ff0, module=0x19d82b20) at coll_tuned_allreduce.c:223
>>>> #10 0x00002aedbd7514f7 in ompi_coll_tuned_allreduce_intra_dec_fixed
>>>> (sbuf=0x7ffff279d444, rbuf=0x7ffff279d440, count=1, dtype=0x6788220,
>>>> op=0x6787a20, comm=0x19d81ff0, module=0x19d82b20) at
>>>> coll_tuned_decision_fixed.c:63
>>>> #11 0x00002aedb85c7792 in PMPI_Allreduce (sendbuf=0x7ffff279d444,
>>>> recvbuf=0x7ffff279d440, count=1, datatype=0x6788220, op=0x6787a20,
>>>> comm=0x19d81ff0) at pallreduce.c:102 #12 0x0000000004387dbf in
>>>> FEMTown::MPI::Allreduce (sendbuf=0x7ffff279d444, recvbuf=0x7ffff279d440,
>>>> count=1, datatype=0x6788220, op=0x6787a20, comm=0x19d81ff0) at
>>>> stubs.cpp:626 #13 0x0000000004058be8 in FEMTown::Domain::align (itf=
>>>>             {<FEMTown::Boost::shared_base_ptr<FEMTown::Domain::Interface>>
>>>> = {_vptr.shared_base_ptr = 0x7ffff279d620, ptr_ = {px = 0x199942a4, pn =
>>>> {pi_ = 0x6}}},<No data fields>}) at interface.cpp:371
>>>> #14 0x00000000040cb858 in FEMTown::Field::detail::align_itfs_and_neighbhors
>>>> (dim=2, set={px = 0x7ffff279d780, pn = {pi_ = 0x2f279d640}},
>>>> check_info=@0x7ffff279d7f0) at check.cpp:63 #15 0x00000000040cbfa8 in
>>>> FEMTown::Field::align_elements (set={px = 0x7ffff279d950, pn = {pi_ =
>>>> 0x66e08d0}}, check_info=@0x7ffff279d7f0) at check.cpp:159 #16
>>>> 0x00000000039acdd4 in PyField_align_elements (self=0x0,
>>>> args=0x2aaab0765050, kwds=0x19d2e950) at check.cpp:31 #17
>>>> 0x0000000001fbf76d in FEMTown::Main::ExErrCatch<_object* (*)(_object*,
>>>> _object*, _object*)>::exec<_object>  (this=0x7ffff279dc20, s=0x0,
>>>> po1=0x2aaab0765050, po2=0x19d2e950) at
>>>> /home/qa/svntop/femtown/modules/main/py/exception.hpp:463
>>>> #18 0x00000000039acc82 in PyField_align_elements_ewrap (self=0x0,
>>>> args=0x2aaab0765050, kwds=0x19d2e950) at check.cpp:39 #19
>>>> 0x00000000044093a0 in PyEval_EvalFrameEx (f=0x19b52e90, throwflag=<value
>>>> optimized out>) at Python/ceval.c:3921 #20 0x000000000440aae9 in
>>>> PyEval_EvalCodeEx (co=0x2aaab754ad50, globals=<value optimized out>,
>>>> locals=<value optimized out>, args=0x3, argcount=1, kws=0x19ace4a0,
>>>> kwcount=2, defs=0x2aaab75e4800, defcount=2, closure=0x0) at
>>>> Python/ceval.c:2968
>>>> #21 0x0000000004408f58 in PyEval_EvalFrameEx (f=0x19ace2d0,
>>>> throwflag=<value optimized out>) at Python/ceval.c:3802 #22
>>>> 0x000000000440aae9 in PyEval_EvalCodeEx (co=0x2aaab7550120, globals=<value
>>>> optimized out>, locals=<value optimized out>, args=0x7, argcount=1,
>>>> kws=0x19acc418, kwcount=3, defs=0x2aaab759e958, defcount=6, closure=0x0)
>>>> at Python/ceval.c:2968
>>>> #23 0x0000000004408f58 in PyEval_EvalFrameEx (f=0x19acc1c0,
>>>> throwflag=<value optimized out>) at Python/ceval.c:3802 #24
>>>> 0x000000000440aae9 in PyEval_EvalCodeEx (co=0x2aaab8b5e738, globals=<value
>>>> optimized out>, locals=<value optimized out>, args=0x6, argcount=1,
>>>> kws=0x19abd328, kwcount=5, defs=0x2aaab891b7e8, defcount=3, closure=0x0)
>>>> at Python/ceval.c:2968
>>>> #25 0x0000000004408f58 in PyEval_EvalFrameEx (f=0x19abcea0,
>>>> throwflag=<value optimized out>) at Python/ceval.c:3802 #26
>>>> 0x000000000440aae9 in PyEval_EvalCodeEx (co=0x2aaab3eb4198, globals=<value
>>>> optimized out>, locals=<value optimized out>, args=0xb, argcount=1,
>>>> kws=0x19a89df0, kwcount=10, defs=0x0, defcount=0, closure=0x0) at
>>>> Python/ceval.c:2968
>>>> #27 0x0000000004408f58 in PyEval_EvalFrameEx (f=0x19a89c40,
>>>> throwflag=<value optimized out>) at Python/ceval.c:3802 #28
>>>> 0x000000000440aae9 in PyEval_EvalCodeEx (co=0x2aaab3eb4288, globals=<value
>>>> optimized out>, locals=<value optimized out>, args=0x1, argcount=0,
>>>> kws=0x19a89330, kwcount=0, defs=0x2aaab8b66668, defcount=1, closure=0x0)
>>>> at Python/ceval.c:2968
>>>> #29 0x0000000004408f58 in PyEval_EvalFrameEx (f=0x19a891b0,
>>>> throwflag=<value optimized out>) at Python/ceval.c:3802 #30
>>>> 0x000000000440aae9 in PyEval_EvalCodeEx (co=0x2aaab8b6a738, globals=<value
>>>> optimized out>, locals=<value optimized out>, args=0x0, argcount=0,
>>>> kws=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0) at
>>>> Python/ceval.c:2968
>>>> #31 0x000000000440ac02 in PyEval_EvalCode (co=0x1902f9b0, globals=0x0,
>>>> locals=0x190d9700) at Python/ceval.c:522 #32 0x000000000442853c in
>>>> PyRun_StringFlags (str=0x192fd3d8 "DIRECT.Actran.main()", start=<value
>>>> optimized out>, globals=0x192213d0, locals=0x192213d0, flags=0x0) at
>>>> Python/pythonrun.c:1335 #33 0x0000000004429690 in PyRun_SimpleStringFlags
>>>> (command=0x192fd3d8 "DIRECT.Actran.main()", flags=0x0) at
>>>> Python/pythonrun.c:957 #34 0x0000000001fa1cf9 in
>>>> FEMTown::Python::FEMPy::run_application (this=0x7ffff279f650) at
>>>> fempy.cpp:873 #35 0x000000000434ce99 in FEMTown::Main::Batch::run
>>>> (this=0x7ffff279f650) at batch.cpp:374 #36 0x0000000001f9aa25 in main
>>>> (argc=8, argv=0x7ffff279fa48) at main.cpp:10 (gdb) f 1
>>>> #1  0x00002aedbc4e05f4 in btl_openib_handle_incoming
>>>> (openib_btl=0x1902f9b0, ep=0x1908a1c0, frag=0x190d9700, byte_len=18) at
>>>> btl_openib_component.c:2881 2881            reg->cbfunc(
>>>> &openib_btl->super, hdr->tag, des, reg->cbdata ); Current language:  auto;
>>>> currently c
>>>> (gdb)
>>>> #1  0x00002aedbc4e05f4 in btl_openib_handle_incoming
>>>> (openib_btl=0x1902f9b0, ep=0x1908a1c0, frag=0x190d9700, byte_len=18) at
>>>> btl_openib_component.c:2881 2881            reg->cbfunc(
>>>> &openib_btl->super, hdr->tag, des, reg->cbdata ); (gdb) l
>>>> 2876
>>>> 2877        if(OPAL_LIKELY(!(is_credit_msg = is_credit_message(frag)))) {
>>>> 2878            /* call registered callback */
>>>> 2879            mca_btl_active_message_callback_t* reg;
>>>> 2880            reg = mca_btl_base_active_message_trigger + hdr->tag;
>>>> 2881            reg->cbfunc(&openib_btl->super, hdr->tag, des, reg->cbdata
>>>> ); 2882            if(MCA_BTL_OPENIB_RDMA_FRAG(frag)) {
>>>> 2883                cqp = (hdr->credits>>  11)&  0x0f;
>>>> 2884                hdr->credits&= 0x87ff;
>>>> 2885            } else {
>>>> 
>>>> Regards,
>>>> Eloi
>>>> 
>>>> On Friday 16 July 2010 16:01:02 Eloi Gaudry wrote:
>>>>> Hi Edgar,
>>>>> 
>>>>> The only difference I could observed was that the segmentation fault
>>>>> appeared sometimes later during the parallel computation.
>>>>> 
>>>>> I'm running out of idea here. I wish I could use the "--mca coll tuned"
>>>>> with "--mca self,sm,tcp" so that I could check that the issue is not
>>>>> somehow limited to the tuned collective routines.
>>>>> 
>>>>> Thanks,
>>>>> Eloi
>>>>> 
>>>>> On Thursday 15 July 2010 17:24:24 Edgar Gabriel wrote:
>>>>>> On 7/15/2010 10:18 AM, Eloi Gaudry wrote:
>>>>>>> hi edgar,
>>>>>>> 
>>>>>>> thanks for the tips, I'm gonna try this option as well. the
>>>>>>> segmentation fault i'm observing always happened during a collective
>>>>>>> communication indeed... does it basically switch all collective
>>>>>>> communication to basic mode, right ?
>>>>>>> 
>>>>>>> sorry for my ignorance, but what's a NCA ?
>>>>>> sorry, I meant to type HCA (InifinBand networking card)
>>>>>> 
>>>>>> Thanks
>>>>>> Edgar
>>>>>> 
>>>>>>> thanks,
>>>>>>> éloi
>>>>>>> 
>>>>>>> On Thursday 15 July 2010 16:20:54 Edgar Gabriel wrote:
>>>>>>>> you could try first to use the algorithms in the basic module, e.g.
>>>>>>>> 
>>>>>>>> mpirun -np x --mca coll basic ./mytest
>>>>>>>> 
>>>>>>>> and see whether this makes a difference. I used to observe sometimes
>>>>>>>> a (similar ?) problem in the openib btl triggered from the tuned
>>>>>>>> collective component, in cases where the ofed libraries were
>>>>>>>> installed but no NCA was found on a node. It used to work however
>>>>>>>> with the basic component.
>>>>>>>> 
>>>>>>>> Thanks
>>>>>>>> Edgar
>>>>>>>> 
>>>>>>>> On 7/15/2010 3:08 AM, Eloi Gaudry wrote:
>>>>>>>>> hi Rolf,
>>>>>>>>> 
>>>>>>>>> unfortunately, i couldn't get rid of that annoying segmentation
>>>>>>>>> fault when selecting another bcast algorithm. i'm now going to
>>>>>>>>> replace MPI_Bcast with a naive implementation (using MPI_Send and
>>>>>>>>> MPI_Recv) and see if that helps.
>>>>>>>>> 
>>>>>>>>> regards,
>>>>>>>>> éloi
>>>>>>>>> 
>>>>>>>>> On Wednesday 14 July 2010 10:59:53 Eloi Gaudry wrote:
>>>>>>>>>> Hi Rolf,
>>>>>>>>>> 
>>>>>>>>>> thanks for your input. You're right, I miss the
>>>>>>>>>> coll_tuned_use_dynamic_rules option.
>>>>>>>>>> 
>>>>>>>>>> I'll check if I the segmentation fault disappears when using the
>>>>>>>>>> basic bcast linear algorithm using the proper command line you
>>>>>>>>>> provided.
>>>>>>>>>> 
>>>>>>>>>> Regards,
>>>>>>>>>> Eloi
>>>>>>>>>> 
>>>>>>>>>> On Tuesday 13 July 2010 20:39:59 Rolf vandeVaart wrote:
>>>>>>>>>>> Hi Eloi:
>>>>>>>>>>> To select the different bcast algorithms, you need to add an
>>>>>>>>>>> extra mca parameter that tells the library to use dynamic
>>>>>>>>>>> selection. --mca coll_tuned_use_dynamic_rules 1
>>>>>>>>>>> 
>>>>>>>>>>> One way to make sure you are typing this in correctly is to use
>>>>>>>>>>> it with ompi_info.  Do the following:
>>>>>>>>>>> ompi_info -mca coll_tuned_use_dynamic_rules 1 --param coll
>>>>>>>>>>> 
>>>>>>>>>>> You should see lots of output with all the different algorithms
>>>>>>>>>>> that can be selected for the various collectives.
>>>>>>>>>>> Therefore, you need this:
>>>>>>>>>>> 
>>>>>>>>>>> --mca coll_tuned_use_dynamic_rules 1 --mca
>>>>>>>>>>> coll_tuned_bcast_algorithm 1
>>>>>>>>>>> 
>>>>>>>>>>> Rolf
>>>>>>>>>>> 
>>>>>>>>>>> On 07/13/10 11:28, Eloi Gaudry wrote:
>>>>>>>>>>>> Hi,
>>>>>>>>>>>> 
>>>>>>>>>>>> I've found that "--mca coll_tuned_bcast_algorithm 1" allowed to
>>>>>>>>>>>> switch to the basic linear algorithm. Anyway whatever the
>>>>>>>>>>>> algorithm used, the segmentation fault remains.
>>>>>>>>>>>> 
>>>>>>>>>>>> Does anyone could give some advice on ways to diagnose the issue
>>>>>>>>>>>> I'm facing ?
>>>>>>>>>>>> 
>>>>>>>>>>>> Regards,
>>>>>>>>>>>> Eloi
>>>>>>>>>>>> 
>>>>>>>>>>>> On Monday 12 July 2010 10:53:58 Eloi Gaudry wrote:
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I'm focusing on the MPI_Bcast routine that seems to randomly
>>>>>>>>>>>>> segfault when using the openib btl. I'd like to know if there
>>>>>>>>>>>>> is any way to make OpenMPI switch to a different algorithm
>>>>>>>>>>>>> than the default one being selected for MPI_Bcast.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Thanks for your help,
>>>>>>>>>>>>> Eloi
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Friday 02 July 2010 11:06:52 Eloi Gaudry wrote:
>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I'm observing a random segmentation fault during an internode
>>>>>>>>>>>>>> parallel computation involving the openib btl and
>>>>>>>>>>>>>> OpenMPI-1.4.2 (the same issue can be observed with
>>>>>>>>>>>>>> OpenMPI-1.3.3).
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>    mpirun (Open MPI) 1.4.2
>>>>>>>>>>>>>>    Report bugs to http://www.open-mpi.org/community/help/
>>>>>>>>>>>>>>    [pbn08:02624] *** Process received signal ***
>>>>>>>>>>>>>>    [pbn08:02624] Signal: Segmentation fault (11)
>>>>>>>>>>>>>>    [pbn08:02624] Signal code: Address not mapped (1)
>>>>>>>>>>>>>>    [pbn08:02624] Failing at address: (nil)
>>>>>>>>>>>>>>    [pbn08:02624] [ 0] /lib64/libpthread.so.0 [0x349540e4c0]
>>>>>>>>>>>>>>    [pbn08:02624] *** End of error message ***
>>>>>>>>>>>>>>    sh: line 1:  2624 Segmentation fault
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> \/share\/hpc3\/actran_suite\/Actran_11\.0\.rc2\.41872\/RedHatE
>>>>>>>>>>>>>> L\ -5 \/ x 86 _6 4\ /bin\/actranpy_mp
>>>>>>>>>>>>>> '--apl=/share/hpc3/actran_suite/Actran_11.0.rc2.41872/RedHatEL
>>>>>>>>>>>>>> -5 /x 86 _ 64 /A c tran_11.0.rc2.41872'
>>>>>>>>>>>>>> '--inputfile=/work/st25652/LSF_130073_0_47696_0/Case1_3Dreal_m
>>>>>>>>>>>>>> 4_ n2 .d a t'
>>>>>>>>>>>>>> '--scratch=/scratch/st25652/LSF_130073_0_47696_0/scratch'
>>>>>>>>>>>>>> '--mem=3200' '--threads=1' '--errorlevel=FATAL' '--t_max=0.1'
>>>>>>>>>>>>>> '--parallel=domain'
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> If I choose not to use the openib btl (by using --mca btl
>>>>>>>>>>>>>> self,sm,tcp on the command line, for instance), I don't
>>>>>>>>>>>>>> encounter any problem and the parallel computation runs
>>>>>>>>>>>>>> flawlessly.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I would like to get some help to be able:
>>>>>>>>>>>>>> - to diagnose the issue I'm facing with the openib btl
>>>>>>>>>>>>>> - understand why this issue is observed only when using the
>>>>>>>>>>>>>> openib btl and not when using self,sm,tcp
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Any help would be very much appreciated.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> The outputs of ompi_info and the configure scripts of OpenMPI
>>>>>>>>>>>>>> are enclosed to this email, and some information on the
>>>>>>>>>>>>>> infiniband drivers as well.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Here is the command line used when launching a parallel
>>>>>>>>>>>>>> computation
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> using infiniband:
>>>>>>>>>>>>>>    path_to_openmpi/bin/mpirun -np $NPROCESS --hostfile
>>>>>>>>>>>>>>    host.list --mca
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> btl openib,sm,self,tcp  --display-map --verbose --version
>>>>>>>>>>>>>> --mca mpi_warn_on_fork 0 --mca btl_openib_want_fork_support 0
>>>>>>>>>>>>>> [...]
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> and the command line used if not using infiniband:
>>>>>>>>>>>>>>    path_to_openmpi/bin/mpirun -np $NPROCESS --hostfile
>>>>>>>>>>>>>>    host.list --mca
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> btl self,sm,tcp  --display-map --verbose --version --mca
>>>>>>>>>>>>>> mpi_warn_on_fork 0 --mca btl_openib_want_fork_support 0 [...]
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> Eloi
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> users mailing list
>>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> --
>>> 
>>> 
>>> Eloi Gaudry
>>> 
>>> Free Field Technologies
>>> Company Website: http://www.fft.be
>>> Company Phone:   +32 10 487 959
>>> 
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/user
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/

Re: [OMPI users] [openib] segfault when using openib btl

Reply via email to