Hi Terry, I'm sorry to say that I might have missed a point here.
I've lately been relaunching all previously failing computations with the message coalescing feature being switched off, and I saw the same hdr->tag=0 error several times, always during a collective call (MPI_Comm_create, MPI_Allreduce and MPI_Broadcast, so far). And as soon as I switched to the peer queue option I was previously using (--mca btl_openib_receive_queues P,65536,256,192,128 instead of using --mca btl_openib_use_message_coalescing 0), all computations ran flawlessly.
As for the reproducer, I've already tried to write something but I haven't succeeded so far at reproducing the hdr->tag=0 issue with it.
Eloi On 24/09/2010 18:37, Terry Dontje wrote:
Eloi Gaudry wrote:Interesting, though it looks to me like the segv in ticket 2352 would have happened on the send side instead of the receive side like you have. As to what to do next it would be really nice to have some sort of reproducer that we can try and debug what is really going on. The only other thing to do without a reproducer is to inspect the code on the send side to figure out what might make it generate at 0 hdr->tag. Or maybe instrument the send side to stop when it is about ready to send a 0 hdr->tag and see if we can see how the code got there.Terry, You were right, the error indeed seems to come from the message coalescing feature. If I turn it off using the "--mca btl_openib_use_message_coalescing 0", I'm not able to observe the "hdr->tag=0" error. There are some trac requests associated to very similar error (https://svn.open-mpi.org/trac/ompi/search?q=coalescing) but they are all closed (excepthttps://svn.open-mpi.org/trac/ompi/ticket/2352 that might be related), aren't they ? What would you suggest Terry ?I might have some cycles to look at this Monday. --tdEloi On Friday 24 September 2010 16:00:26 Terry Dontje wrote:Eloi Gaudry wrote:Terry, No, I haven't tried any other values than P,65536,256,192,128 yet. The reason why is quite simple. I've been reading and reading again this thread to understand the btl_openib_receive_queues meaning and I can't figure out why the default values seem to induce the hdr-tag=0 issue (http://www.open-mpi.org/community/lists/users/2009/01/7808.php).Yeah, the size of the fragments and number of them really should not cause this issue. So I too am a little perplexed about it.Do you think that the default shared received queue parameters are erroneous for this specific Mellanox card ? Any help on finding the proper parameters would actually be much appreciated.I don't necessarily think it is the queue size for a specific card but more so the handling of the queues by the BTL when using certain sizes. At least that is one gut feel I have. In my mind the tag being 0 is either something below OMPI is polluting the data fragment or OMPI's internal protocol is some how getting messed up. I can imagine (no empirical data here) the queue sizes could change how the OMPI protocol sets things up. Another thing may be the coalescing feature in the openib BTL which tries to gang multiple messages into one packet when resources are running low. I can see where changing the queue sizes might affect the coalescing. So, it might be interesting to turn off the coalescing. You can do that by setting "--mca btl_openib_use_message_coalescing 0" in your mpirun line. If that doesn't solve the issue then obviously there must be something else going on :-). Note, the reason I am interested in this is I am seeing a similar error condition (hdr->tag == 0) on a development system. Though my failing case fails with np=8 using the connectivity test program which is mainly point to point and there are not a significant amount of data transfers going on either. --tdEloi On Friday 24 September 2010 14:27:07 you wrote:That is interesting. So does the number of processes affect your runs any. The times I've seen hdr->tag be 0 usually has been due to protocol issues. The tag should never be 0. Have you tried to do other receive_queue settings other than the default and the one you mention. I wonder if you did a combination of the two receive queues causes a failure or not. Something like P,128,256,192,128:P,65536,256,192,128 I am wondering if it is the first queuing definition causing the issue or possibly the SRQ defined in the default. --td Eloi Gaudry wrote:Hi Terry, The messages being send/received can be of any size, but the error seems to happen more often with small messages (as an int being broadcasted or allreduced). The failing communication differs from one run to another, but some spots are more likely to be failing than another. And as far as I know, there are always located next to a small message (an int being broadcasted for instance) communication. Other typical messages size are10k but can be very much larger.I've been checking the hca being used, its' from mellanox (with vendor_part_id=26428). There is no receive_queues parameters associated to it. $ cat share/openmpi/mca-btl-openib-device-params.ini as well: [...] # A.k.a. ConnectX [Mellanox Hermon] vendor_id = 0x2c9,0x5ad,0x66a,0x8f1,0x1708,0x03ba,0x15b3 vendor_part_id = 25408,25418,25428,26418,26428,25448,26438,26448,26468,26478,26488 use_eager_rdma = 1 mtu = 2048 max_inline_data = 128 [..] $ ompi_info --param btl openib --parsable | grep receive_queues mca:btl:openib:param:btl_openib_receive_queues:value:P,128,256,192,128 :S ,2048,256,128,32:S,12288,256,128,32:S,65536,256,128,32 mca:btl:openib:param:btl_openib_receive_queues:data_source:default value mca:btl:openib:param:btl_openib_receive_queues:status:writable mca:btl:openib:param:btl_openib_receive_queues:help:Colon-delimited, comma delimited list of receive queues: P,4096,8,6,4:P,32768,8,6,4 mca:btl:openib:param:btl_openib_receive_queues:deprecated:no I was wondering if these parameters (automatically computed at openib btl init for what I understood) were not incorrect in some way and I plugged some others values: "P,65536,256,192,128" (someone on the list used that values when encountering a different issue) . Since that, I haven't been able to observe the segfault (occuring as hrd->tag = 0 in btl_openib_component.c:2881) yet. Eloi /home/pp_fr/st03230/EG/Softs/openmpi-custom-1.4.2/bin/ On Thursday 23 September 2010 23:33:48 Terry Dontje wrote:Eloi, I am curious about your problem. Can you tell me what size of job it is? Does it always fail on the same bcast, or same process? Eloi Gaudry wrote:Hi Nysal, Thanks for your suggestions. I'm now able to get the checksum computed and redirected to stdout, thanks (I forgot the "-mca pml_base_verbose 5" option, you were right). I haven't been able to observe the segmentation fault (with hdr->tag=0) so far (when using pml csum) but I 'll let you know when I am. I've got two others question, which may be related to the error observed: 1/ does the maximum number of MPI_Comm that can be handled by OpenMPI somehow depends on the btl being used (i.e. if I'm using openib, may I use the same number of MPI_Comm object as with tcp) ? Is there something as MPI_COMM_MAX in OpenMPI ? 2/ the segfaults only appears during a mpi collective call, with very small message (one int is being broadcast, for instance) ; i followed the guidelines given athttp://icl.cs.utk.edu/open- mpi/faq/?category=openfabrics#ib-small-message-rdma but the debug-build of OpenMPI asserts if I use a different min-size that 255. Anyway, if I deactivate eager_rdma, the segfaults remains. Does the openib btl handle very small message differently (even with eager_rdma deactivated) than tcp ?Others on the list does coalescing happen with non-eager_rdma? If so then that would possibly be one difference between the openib btl and tcp aside from the actual protocol used.is there a way to make sure that large messages and small messages are handled the same way ?Do you mean so they all look like eager messages? How large of messages are we talking about here 1K, 1M or 10M? --tdRegards, Eloi On Friday 17 September 2010 17:57:17 Nysal Jan wrote:Hi Eloi, Create a debug build of OpenMPI (--enable-debug) and while running with the csum PML add "-mca pml_base_verbose 5" to the command line. This will print the checksum details for each fragment sent over the wire. I'm guessing it didnt catch anything because the BTL failed. The checksum verification is done in the PML, which the BTL calls via a callback function. In your case the PML callback is never called because the hdr->tag is invalid. So enabling checksum tracing also might not be of much use. Is it the first Bcast that fails or the nth Bcast and what is the message size? I'm not sure what could be the problem at this moment. I'm afraid you will have to debug the BTL to find out more. --Nysal On Fri, Sep 17, 2010 at 4:39 PM, Eloi Gaudry<e...@fft.be> wrote:Hi Nysal, thanks for your response. I've been unable so far to write a test case that could illustrate the hdr->tag=0 error. Actually, I'm only observing this issue when running an internode computation involving infiniband hardware from Mellanox (MT25418, ConnectX IB DDR, PCIe 2.0 2.5GT/s, rev a0) with our time-domain software. I checked, double-checked, and rechecked again every MPI use performed during a parallel computation and I couldn't find any error so far. The fact that the very same parallel computation run flawlessly when using tcp (and disabling openib support) might seem to indicate that the issue is somewhere located inside the openib btl or at the hardware/driver level. I've just used the "-mca pml csum" option and I haven't seen any related messages (when hdr->tag=0 and the segfaults occurs). Any suggestion ? Regards, Eloi On Friday 17 September 2010 16:03:34 Nysal Jan wrote:Hi Eloi, Sorry for the delay in response. I haven't read the entire email thread, but do you have a test case which can reproduce this error? Without that it will be difficult to nail down the cause. Just to clarify, I do not work for an iwarp vendor. I can certainly try to reproduce it on an IB system. There is also a PML called csum, you can use it via "-mca pml csum", which will checksum the MPI messages and verify it at the receiver side for any data corruption. You can try using it to see if it is abletocatch anything. Regards --Nysal On Thu, Sep 16, 2010 at 3:48 PM, Eloi Gaudry<e...@fft.be> wrote:Hi Nysal, I'm sorry to intrrupt, but I was wondering if you had a chance to lookatthis error. Regards, Eloi -- Eloi Gaudry Free Field Technologies Company Website:http://www.fft.be Company Phone: +32 10 487 959 ---------- Forwarded message ---------- From: Eloi Gaudry<e...@fft.be> To: Open MPI Users<us...@open-mpi.org> Date: Wed, 15 Sep 2010 16:27:43 +0200 Subject: Re: [OMPI users] [openib] segfault when using openib btl Hi, I was wondering if anybody got a chance to have a look at this issue. Regards, Eloi On Wednesday 18 August 2010 09:16:26 Eloi Gaudry wrote:Hi Jeff, Please find enclosed the output (valgrind.out.gz) from /opt/openmpi-debug-1.4.2/bin/orterun -np 2 --host pbn11,pbn10 --mcabtlopenib,self --display-map --verbose --mca mpi_warn_on_fork 0 --mca btl_openib_want_fork_support 0 -tag-output /opt/valgrind-3.5.0/bin/valgrind --tool=memcheck --suppressions=/opt/openmpi-debug-1.4.2/share/openmpi/openmpi- valgrind.supp --suppressions=./suppressions.python.supp /opt/actran/bin/actranpy_mp ... Thanks, Eloi On Tuesday 17 August 2010 09:32:53 Eloi Gaudry wrote:On Monday 16 August 2010 19:14:47 Jeff Squyres wrote:On Aug 16, 2010, at 10:05 AM, Eloi Gaudry wrote:I did run our application through valgrind but it couldn't find any "Invalid write": there is a bunch of "Invalid read" (I'm using1.4.2with the suppression file), "Use of uninitialized bytes" and "Conditional jump depending on uninitialized bytes" indifferentompiroutines. Some of them are located in btl_openib_component.c. I'll send you an output of valgrind shortly.A lot of them in btl_openib_* are to be expected -- OpenFabrics uses OS-bypass methods for some of its memory, and therefore valgrind is unaware of them (and therefore incorrectly marks them as uninitialized).would it help if i use the upcoming 1.5 version of openmpi ? ireadthata huge effort has been done to clean-up the valgrind output ? but maybe that this doesn't concern this btl (for the reasons you mentionned).Another question, you said that the callback function pointershouldnever be 0. But can the tag be null (hdr->tag) ?The tag is not a pointer -- it's just an integer.I was worrying that its value could not be null. I'll send a valgrind output soon (i need to build libpython without pymalloc first). Thanks, EloiThanks for your help, Eloi On 16/08/2010 18:22, Jeff Squyres wrote:Sorry for the delay in replying. Odd; the values of the callback function pointer should neverbe0.This seems to suggest some kind of memory corruption is occurring. I don't know if it's possible, because the stack trace looks like you're calling through python, but can you run this application through valgrind, or some other memory-checking debugger? On Aug 10, 2010, at 7:15 AM, Eloi Gaudry wrote:Hi, sorry, i just forgot to add the values of the functionparameters:(gdb) print reg->cbdata $1 = (void *) 0x0 (gdb) print openib_btl->super $2 = {btl_component = 0x2b341edd7380, btl_eager_limit =12288,btl_rndv_eager_limit = 12288, btl_max_send_size = 65536, btl_rdma_pipeline_send_length = 1048576, btl_rdma_pipeline_frag_size = 1048576,btl_min_rdma_pipeline_size= 1060864, btl_exclusivity = 1024, btl_latency = 10, btl_bandwidth = 800, btl_flags = 310, btl_add_procs = 0x2b341eb8ee47<mca_btl_openib_add_procs>, btl_del_procs = 0x2b341eb90156<mca_btl_openib_del_procs>, btl_register = 0, btl_finalize = 0x2b341eb93186<mca_btl_openib_finalize>,btl_alloc= 0x2b341eb90a3e<mca_btl_openib_alloc>, btl_free = 0x2b341eb91400<mca_btl_openib_free>, btl_prepare_src = 0x2b341eb91813<mca_btl_openib_prepare_src>, btl_prepare_dst=0x2b341eb91f2e<mca_btl_openib_prepare_dst>, btl_send = 0x2b341eb94517<mca_btl_openib_send>, btl_sendi = 0x2b341eb9340d<mca_btl_openib_sendi>, btl_put = 0x2b341eb94660<mca_btl_openib_put>, btl_get = 0x2b341eb94c4e<mca_btl_openib_get>, btl_dump = 0x2b341acd45cb<mca_btl_base_dump>, btl_mpool = 0xf3f4110, btl_register_error = 0x2b341eb90565<mca_btl_openib_register_error_cb>, btl_ft_event=0x2b341eb952e7<mca_btl_openib_ft_event>} (gdb) print hdr->tag $3 = 0 '\0' (gdb) print des $4 = (mca_btl_base_descriptor_t *) 0xf4a6700 (gdb) print reg->cbfunc $5 = (mca_btl_base_module_recv_cb_fn_t) 0 Eloi On Tuesday 10 August 2010 16:04:08 Eloi Gaudry wrote:Hi, Here is the output of a core file generated during asegmentationfault observed during a collective call (using openib): #0 0x0000000000000000 in ?? () (gdb) where #0 0x0000000000000000 in ?? () #1 0x00002aedbc4e05f4 in btl_openib_handle_incoming (openib_btl=0x1902f9b0, ep=0x1908a1c0, frag=0x190d9700, byte_len=18) at btl_openib_component.c:2881 #2 0x00002aedbc4e25e2 in handle_wc (device=0x19024ac0, cq=0, wc=0x7ffff279ce90) at btl_openib_component.c:3178 #3 0x00002aedbc4e2e9d inpoll_device(device=0x19024ac0, count=2) at btl_openib_component.c:3318#40x00002aedbc4e34b8 in progress_one_device(device=0x19024ac0)at btl_openib_component.c:3426 #5 0x00002aedbc4e3561 in btl_openib_component_progress () at btl_openib_component.c:3451#60x00002aedb8b22ab8 in opal_progress () at runtime/opal_progress.c:207 #7 0x00002aedb859f497 in opal_condition_wait (c=0x2aedb888ccc0, m=0x2aedb888cd20) at ../opal/threads/condition.h:99 #8 0x00002aedb859fa31 in ompi_request_default_wait_all(count=2,requests=0x7ffff279d0e0, statuses=0x0) at request/req_wait.c:262 #9 0x00002aedbd7559ad in ompi_coll_tuned_allreduce_intra_recursivedoubling (sbuf=0x7ffff279d444, rbuf=0x7ffff279d440, count=1, dtype=0x6788220, op=0x6787a20, comm=0x19d81ff0, module=0x19d82b20) atcoll_tuned_allreduce.c:223#10 0x00002aedbd7514f7 in ompi_coll_tuned_allreduce_intra_dec_fixed (sbuf=0x7ffff279d444, rbuf=0x7ffff279d440, count=1, dtype=0x6788220, op=0x6787a20, comm=0x19d81ff0, module=0x19d82b20) at coll_tuned_decision_fixed.c:63 #11 0x00002aedb85c7792 in PMPI_Allreduce(sendbuf=0x7ffff279d444,recvbuf=0x7ffff279d440, count=1, datatype=0x6788220,op=0x6787a20,comm=0x19d81ff0) at pallreduce.c:102 #12 0x0000000004387dbfinFEMTown::MPI::Allreduce (sendbuf=0x7ffff279d444, recvbuf=0x7ffff279d440, count=1, datatype=0x6788220,op=0x6787a20,comm=0x19d81ff0) at stubs.cpp:626 #13 0x0000000004058be8 in FEMTown::Domain::align (itf={<FEMTown::Boost::shared_base_ptr<FEMTown::Domain::Inter fa ce>> = {_vptr.shared_base_ptr = 0x7ffff279d620, ptr_ = {px = 0x199942a4, pn = {pi_ = 0x6}}},<No data fields>}) at interface.cpp:371 #14 0x00000000040cb858 in FEMTown::Field::detail::align_itfs_and_neighbhors (dim=2,set={px= 0x7ffff279d780, pn = {pi_ = 0x2f279d640}}, check_info=@0x7ffff279d7f0) at check.cpp:63 #150x00000000040cbfa8in FEMTown::Field::align_elements (set={px = 0x7ffff279d950, pn={pi_ = 0x66e08d0}}, check_info=@0x7ffff279d7f0) at check.cpp:159 #16 0x00000000039acdd4 in PyField_align_elements (self=0x0, args=0x2aaab0765050, kwds=0x19d2e950) at check.cpp:31 #17 0x0000000001fbf76d in FEMTown::Main::ExErrCatch<_object* (*)(_object*, _object*, _object*)>::exec<_object> (this=0x7ffff279dc20, s=0x0, po1=0x2aaab0765050, po2=0x19d2e950) at /home/qa/svntop/femtown/modules/main/py/exception.hpp:463#180x00000000039acc82 in PyField_align_elements_ewrap(self=0x0,args=0x2aaab0765050, kwds=0x19d2e950) at check.cpp:39 #19 0x00000000044093a0 in PyEval_EvalFrameEx (f=0x19b52e90, throwflag=<value optimized out>) at Python/ceval.c:3921 #20 0x000000000440aae9 in PyEval_EvalCodeEx (co=0x2aaab754ad50, globals=<value optimized out>, locals=<value optimized out>, args=0x3, argcount=1, kws=0x19ace4a0, kwcount=2, defs=0x2aaab75e4800, defcount=2, closure=0x0) at Python/ceval.c:2968 #21 0x0000000004408f58 in PyEval_EvalFrameEx (f=0x19ace2d0, throwflag=<value optimized out>) at Python/ceval.c:3802 #22 0x000000000440aae9 in PyEval_EvalCodeEx (co=0x2aaab7550120, globals=<value optimized out>, locals=<value optimized out>, args=0x7, argcount=1, kws=0x19acc418, kwcount=3, defs=0x2aaab759e958, defcount=6, closure=0x0) at Python/ceval.c:2968 #23 0x0000000004408f58 in PyEval_EvalFrameEx (f=0x19acc1c0, throwflag=<value optimized out>) at Python/ceval.c:3802 #24 0x000000000440aae9 in PyEval_EvalCodeEx (co=0x2aaab8b5e738, globals=<value optimized out>, locals=<value optimized out>, args=0x6, argcount=1, kws=0x19abd328, kwcount=5, defs=0x2aaab891b7e8, defcount=3, closure=0x0) at Python/ceval.c:2968 #25 0x0000000004408f58 in PyEval_EvalFrameEx (f=0x19abcea0, throwflag=<value optimized out>) at Python/ceval.c:3802 #26 0x000000000440aae9 in PyEval_EvalCodeEx (co=0x2aaab3eb4198, globals=<value optimized out>, locals=<value optimized out>, args=0xb, argcount=1, kws=0x19a89df0, kwcount=10, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:2968 #27 0x0000000004408f58 in PyEval_EvalFrameEx (f=0x19a89c40, throwflag=<value optimized out>) at Python/ceval.c:3802 #28 0x000000000440aae9 in PyEval_EvalCodeEx (co=0x2aaab3eb4288, globals=<value optimized out>, locals=<value optimized out>, args=0x1, argcount=0, kws=0x19a89330, kwcount=0, defs=0x2aaab8b66668, defcount=1, closure=0x0) at Python/ceval.c:2968 #29 0x0000000004408f58 in PyEval_EvalFrameEx (f=0x19a891b0, throwflag=<value optimized out>) at Python/ceval.c:3802 #30 0x000000000440aae9 in PyEval_EvalCodeEx (co=0x2aaab8b6a738, globals=<value optimized out>, locals=<value optimized out>, args=0x0, argcount=0, kws=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:2968 #31 0x000000000440ac02 in PyEval_EvalCode (co=0x1902f9b0, globals=0x0, locals=0x190d9700) at Python/ceval.c:522 #32 0x000000000442853c in PyRun_StringFlags (str=0x192fd3d8 "DIRECT.Actran.main()", start=<value optimized out>, globals=0x192213d0, locals=0x192213d0, flags=0x0) at Python/pythonrun.c:1335 #33 0x0000000004429690 in PyRun_SimpleStringFlags (command=0x192fd3d8 "DIRECT.Actran.main()", flags=0x0) at Python/pythonrun.c:957 #34 0x0000000001fa1cf9 in FEMTown::Python::FEMPy::run_application(this=0x7ffff279f650)at fempy.cpp:873 #35 0x000000000434ce99 inFEMTown::Main::Batch::run(this=0x7ffff279f650) at batch.cpp:374 #360x0000000001f9aa25in main (argc=8, argv=0x7ffff279fa48) at main.cpp:10 (gdb) f 1 #1 0x00002aedbc4e05f4 in btl_openib_handle_incoming (openib_btl=0x1902f9b0, ep=0x1908a1c0, frag=0x190d9700, byte_len=18) at btl_openib_component.c:2881 2881 reg->cbfunc(&openib_btl->super, hdr->tag, des, reg->cbdata);Current language: auto; currently c (gdb) #1 0x00002aedbc4e05f4 in btl_openib_handle_incoming (openib_btl=0x1902f9b0, ep=0x1908a1c0, frag=0x190d9700, byte_len=18) at btl_openib_component.c:2881 2881 reg->cbfunc(&openib_btl->super, hdr->tag, des, reg->cbdata);(gdb) l 2876 2877 if(OPAL_LIKELY(!(is_credit_msg = is_credit_message(frag)))) { 2878 /* call registered callback */ 2879 mca_btl_active_message_callback_t* reg; 2880 reg = mca_btl_base_active_message_trigger + hdr->tag; 2881 reg->cbfunc(&openib_btl->super, hdr->tag, des, reg->cbdata ); 2882 if(MCA_BTL_OPENIB_RDMA_FRAG(frag)) { 2883 cqp=(hdr->credits>> 11)& 0x0f; 2884 hdr->credits&= 0x87ff; 2885 } else { Regards, Eloi On Friday 16 July 2010 16:01:02 Eloi Gaudry wrote:Hi Edgar, The only difference I could observed was that the segmentation fault appeared sometimes later during the parallel computation. I'm running out of idea here. I wish I could use the "--mcacolltuned" with "--mca self,sm,tcp" so that I could check that the issue is not somehow limited to the tuned collective routines. Thanks, Eloi On Thursday 15 July 2010 17:24:24 Edgar Gabriel wrote:On 7/15/2010 10:18 AM, Eloi Gaudry wrote:hi edgar, thanks for the tips, I'm gonna try this option as well.thesegmentation fault i'm observing always happened during a collective communication indeed... does it basicallyswitchallcollective communication to basic mode, right ? sorry for my ignorance, but what's a NCA ?sorry, I meant to type HCA (InifinBand networking card) Thanks Edgarthanks, éloi On Thursday 15 July 2010 16:20:54 Edgar Gabriel wrote:you could try first to use the algorithms in the basicmodule,e.g. mpirun -np x --mca coll basic ./mytest and see whether this makes a difference. I used toobservesometimes a (similar ?) problem in the openib btl triggered from the tuned collective component, in cases where the ofed libraries were installed but no NCA was found on a node. It used to work however with the basic component. Thanks Edgar On 7/15/2010 3:08 AM, Eloi Gaudry wrote:hi Rolf, unfortunately, i couldn't get rid of that annoying segmentation fault when selecting another bcast algorithm. i'm now going to replace MPI_Bcast with a naive implementation (using MPI_Send and MPI_Recv) and see ifthathelps. regards, éloi On Wednesday 14 July 2010 10:59:53 Eloi Gaudry wrote:Hi Rolf, thanks for your input. You're right, I miss the coll_tuned_use_dynamic_rules option. I'll check if I the segmentation fault disappears whenusingthe basic bcast linear algorithm using the proper command line you provided. Regards, Eloi On Tuesday 13 July 2010 20:39:59 Rolf vandeVaartwrote:Hi Eloi: To select the different bcast algorithms, you need to add an extra mca parameter that tells the library to use dynamic selection. --mca coll_tuned_use_dynamic_rules 1 One way to make sure you are typing this in correctly istouse it with ompi_info. Do the following: ompi_info -mca coll_tuned_use_dynamic_rules 1 --paramcollYou should see lots of output with all the different algorithms that can be selected for the various collectives. Therefore, you need this: --mca coll_tuned_use_dynamic_rules 1 --mca coll_tuned_bcast_algorithm 1 Rolf On 07/13/10 11:28, Eloi Gaudry wrote:Hi, I've found that "--mca coll_tuned_bcast_algorithm 1" allowed to switch to the basic linear algorithm. Anyway whatever the algorithm used, the segmentation fault remains. Does anyone could give some advice on ways todiagnosetheissue I'm facing ? Regards, Eloi On Monday 12 July 2010 10:53:58 Eloi Gaudry wrote:Hi, I'm focusing on the MPI_Bcast routine that seems to randomly segfault when using the openib btl. I'dliketoknow if there is any way to make OpenMPI switch toadifferent algorithm than the default one being selected for MPI_Bcast. Thanks for your help, Eloi On Friday 02 July 2010 11:06:52 Eloi Gaudry wrote:Hi, I'm observing a random segmentation fault duringaninternode parallel computation involving theopenibbtland OpenMPI-1.4.2 (the same issue can be observed with OpenMPI-1.3.3). mpirun (Open MPI) 1.4.2 Report bugs to http://www.open-mpi.org/community/help/ [pbn08:02624] *** Process received signal *** [pbn08:02624] Signal: Segmentation fault (11) [pbn08:02624] Signal code: Address not mapped(1)[pbn08:02624] Failing at address: (nil) [pbn08:02624] [ 0] /lib64/libpthread.so.0 [0x349540e4c0] [pbn08:02624] *** End of errormessage*** sh: line 1: 2624 Segmentation fault\/share\/hpc3\/actran_suite\/Actran_11\.0\.rc2\.41872\/Red Ha tE L\ -5 \/ x 86 _6 4\ /bin\/actranpy_mp'--apl=/share/hpc3/actran_suite/Actran_11.0.rc2.41872/RedH at EL -5 /x 86 _ 64 /A c tran_11.0.rc2.41872''--inputfile=/work/st25652/LSF_130073_0_47696_0/Case1_3Dre al _m 4_ n2 .d a t''--scratch=/scratch/st25652/LSF_130073_0_47696_0/scratch' '--mem=3200' '--threads=1' '--errorlevel=FATAL' '--t_max=0.1' '--parallel=domain' If I choose not to use the openib btl (by using --mca btl self,sm,tcp on the command line, for instance), I don't encounter any problem and the parallel computation runs flawlessly. I would like to get some help to be able: - to diagnose the issue I'm facing with the openib btl - understand why this issue is observed only whenusingthe openib btl and not when using self,sm,tcp Any help would be very much appreciated. The outputs of ompi_info and the configure scripts of OpenMPI are enclosed to this email, and someinformationon the infiniband drivers as well. Here is the command line used when launching aparallelcomputation using infiniband: path_to_openmpi/bin/mpirun -np $NPROCESS --hostfile host.list --mca btl openib,sm,self,tcp --display-map --verbose --version --mca mpi_warn_on_fork 0 --mca btl_openib_want_fork_support 0 [...] and the command line used if not using infiniband: path_to_openmpi/bin/mpirun -np $NPROCESS --hostfile host.list --mca btl self,sm,tcp --display-map --verbose --version--mcampi_warn_on_fork 0 --mca btl_openib_want_fork_support0[...] Thanks, Eloi______________________________________________________________________________________________ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users