Re: [OMPI users] Dynamic processes connection and segfault on MPI_Comm_accept

Ralph Castain Tue, 27 Jul 2010 14:42:44 -0400

This slides outside of my purview - I would suggest you post this question with 
a different subject line specifically mentioning failure of intercomm_merge to 
work so it attracts the attention of those with knowledge of that area.



On Jul 27, 2010, at 9:30 AM, Grzegorz Maj wrote:

> So now I have a new question.
> When I run my server and a lot of clients on the same machine,
> everything looks fine.
> 
> But when I try to run the clients on several machines the most
> frequent scenario is:
> * server is stared on machine A
> * X (= 1, 4, 10, ..) clients are started on machine B and they connect
> successfully
> * the first client starting on machine C connects successfully to the
> server, but the whole grid hangs on MPI_Comm_merge (all the processes
> from intercommunicator get there).
> 
> As I said it's the most frequent scenario. Sometimes I can connect the
> clients from several machines. Sometimes it hangs (always on
> MPI_Comm_merge) when connecting the clients from machine B.
> The interesting thing is, that if before MPI_Comm_merge I send a dummy
> message on the intercommunicator from process rank 0 in one group to
> process rank 0 in the other one, it will not hang on MPI_Comm_merge.
> 
> I've tried both versions with and without the first patch (ompi-server
> as orted) but it doesn't change the behavior.
> 
> I've attached gdb to my server, this is bt:
> #0  0xffffe410 in __kernel_vsyscall ()
> #1  0x00637afc in sched_yield () from /lib/libc.so.6
> #2  0xf7e8ce31 in opal_progress () at ../../opal/runtime/opal_progress.c:220
> #3  0xf7f60ad4 in opal_condition_wait (c=0xf7fd7dc0, m=0xf7fd7e00) at
> ../../opal/threads/condition.h:99
> #4  0xf7f60dee in ompi_request_default_wait_all (count=2,
> requests=0xff8d7754, statuses=0x0) at
> ../../ompi/request/req_wait.c:262
> #5  0xf7d3e221 in mca_coll_inter_allgatherv_inter (sbuf=0xff8d7824,
> scount=1, sdtype=0x8049200, rbuf=0xff8d77e0, rcounts=0x9783df8,
> disps=0x9755520, rdtype=0x8049200, comm=0x978c2a8, module=0x9794b08)
>    at ../../../../../ompi/mca/coll/inter/coll_inter_allgatherv.c:127
> #6  0xf7f4c615 in ompi_comm_determine_first (intercomm=0x978c2a8,
> high=0) at ../../ompi/communicator/comm.c:1199
> #7  0xf7f8d1d9 in PMPI_Intercomm_merge (intercomm=0x978c2a8, high=0,
> newcomm=0xff8d78c0) at pintercomm_merge.c:84
> #8  0x0804893c in main (argc=Cannot access memory at address 0xf
> ) at server.c:50
> 
> And this is bt from one of the clients:
> #0  0xffffe410 in __kernel_vsyscall ()
> #1  0x0064993b in poll () from /lib/libc.so.6
> #2  0xf7de027f in poll_dispatch (base=0x8643fb8, arg=0x86442d8,
> tv=0xff82299c) at ../../../opal/event/poll.c:168
> #3  0xf7dde4b2 in opal_event_base_loop (base=0x8643fb8, flags=2) at
> ../../../opal/event/event.c:807
> #4  0xf7dde34f in opal_event_loop (flags=2) at ../../../opal/event/event.c:730
> #5  0xf7dcfc77 in opal_progress () at ../../opal/runtime/opal_progress.c:189
> #6  0xf7ea80b8 in opal_condition_wait (c=0xf7f25160, m=0xf7f251a0) at
> ../../opal/threads/condition.h:99
> #7  0xf7ea7ff3 in ompi_request_wait_completion (req=0x8686680) at
> ../../ompi/request/request.h:375
> #8  0xf7ea7ef1 in ompi_request_default_wait (req_ptr=0xff822ae8,
> status=0x0) at ../../ompi/request/req_wait.c:37
> #9  0xf7c663a6 in ompi_coll_tuned_bcast_intra_generic
> (buffer=0xff822d20, original_count=1, datatype=0x868bd00, root=0,
> comm=0x86aa7f8, module=0x868b700, count_by_segment=1, tree=0x868b3d8)
>    at ../../../../../ompi/mca/coll/tuned/coll_tuned_bcast.c:237
> #10 0xf7c668ea in ompi_coll_tuned_bcast_intra_binomial
> (buffer=0xff822d20, count=1, datatype=0x868bd00, root=0,
> comm=0x86aa7f8, module=0x868b700, segsize=0)
>    at ../../../../../ompi/mca/coll/tuned/coll_tuned_bcast.c:368
> #11 0xf7c5af12 in ompi_coll_tuned_bcast_intra_dec_fixed
> (buff=0xff822d20, count=1, datatype=0x868bd00, root=0, comm=0x86aa7f8,
> module=0x868b700)
>    at ../../../../../ompi/mca/coll/tuned/coll_tuned_decision_fixed.c:256
> #12 0xf7c73269 in mca_coll_sync_bcast (buff=0xff822d20, count=1,
> datatype=0x868bd00, root=0, comm=0x86aa7f8, module=0x86aaa28) at
> ../../../../../ompi/mca/coll/sync/coll_sync_bcast.c:44
> #13 0xf7c80381 in mca_coll_inter_allgatherv_inter (sbuf=0xff822d64,
> scount=0, sdtype=0x8049400, rbuf=0xff822d20, rcounts=0x868a188,
> disps=0x868abb8, rdtype=0x8049400, comm=0x86aa300,
>    module=0x86aae18) at
> ../../../../../ompi/mca/coll/inter/coll_inter_allgatherv.c:134
> #14 0xf7e9398f in ompi_comm_determine_first (intercomm=0x86aa300,
> high=0) at ../../ompi/communicator/comm.c:1199
> #15 0xf7ed7833 in PMPI_Intercomm_merge (intercomm=0x86aa300, high=0,
> newcomm=0xff8241d0) at pintercomm_merge.c:84
> #16 0x08048afd in main (argc=943274038, argv=0x33393133) at client.c:47
> 
> 
> 
> What do you think may cause the problem?
> 
> 
> 2010/7/26 Ralph Castain <r...@open-mpi.org>:
>> No problem at all - glad it works!
>> 
>> On Jul 26, 2010, at 7:58 AM, Grzegorz Maj wrote:
>> 
>>> Hi,
>>> I'm very sorry, but the problem was on my side. My installation
>>> process was not always taking the newest sources of openmpi. In this
>>> case it hasn't installed the version with the latest patch. Now I
>>> think everything works fine - I could run over 130 processes with no
>>> problems.
>>> I'm sorry again that I've wasted your time. And thank you for the patch.
>>> 
>>> 2010/7/21 Ralph Castain <r...@open-mpi.org>:
>>>> We're having some problem replicating this once my patches are applied. 
>>>> Can you send us your configure cmd? Just the output from "head config.log" 
>>>> will do for now.
>>>> 
>>>> Thanks!
>>>> 
>>>> On Jul 20, 2010, at 9:09 AM, Grzegorz Maj wrote:
>>>> 
>>>>> My start script looks almost exactly the same as the one published by
>>>>> Edgar, ie. the processes are starting one by one with no delay.
>>>>> 
>>>>> 2010/7/20 Ralph Castain <r...@open-mpi.org>:
>>>>>> Grzegorz: something occurred to me. When you start all these processes, 
>>>>>> how are you staggering their wireup? Are they flooding us, or are you 
>>>>>> time-shifting them a little?
>>>>>> 
>>>>>> 
>>>>>> On Jul 19, 2010, at 10:32 AM, Edgar Gabriel wrote:
>>>>>> 
>>>>>>> Hm, so I am not sure how to approach this. First of all, the test case
>>>>>>> works for me. I used up to 80 clients, and for both optimized and
>>>>>>> non-optimized compilation. I ran the tests with trunk (not with 1.4
>>>>>>> series, but the communicator code is identical in both cases). Clearly,
>>>>>>> the patch from Ralph is necessary to make it work.
>>>>>>> 
>>>>>>> Additionally, I went through the communicator creation code for dynamic
>>>>>>> communicators trying to find spots that could create problems. The only
>>>>>>> place that I found the number 64 appear is the fortran-to-c mapping
>>>>>>> arrays (e.g. for communicators), where the initial size of the table is
>>>>>>> 64. I looked twice over the pointer-array code to see whether we could
>>>>>>> have a problem their (since it is a key-piece of the cid allocation code
>>>>>>> for communicators), but I am fairly confident that it is correct.
>>>>>>> 
>>>>>>> Note, that we have other (non-dynamic tests), were comm_set is called
>>>>>>> 100,000 times, and the code per se does not seem to have a problem due
>>>>>>> to being called too often. So I am not sure what else to look at.
>>>>>>> 
>>>>>>> Edgar
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On 7/13/2010 8:42 PM, Ralph Castain wrote:
>>>>>>>> As far as I can tell, it appears the problem is somewhere in our 
>>>>>>>> communicator setup. The people knowledgeable on that area are going to 
>>>>>>>> look into it later this week.
>>>>>>>> 
>>>>>>>> I'm creating a ticket to track the problem and will copy you on it.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Jul 13, 2010, at 6:57 AM, Ralph Castain wrote:
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Jul 13, 2010, at 3:36 AM, Grzegorz Maj wrote:
>>>>>>>>> 
>>>>>>>>>> Bad news..
>>>>>>>>>> I've tried the latest patch with and without the prior one, but it
>>>>>>>>>> hasn't changed anything. I've also tried using the old code but with
>>>>>>>>>> the OMPI_DPM_BASE_MAXJOBIDS constant changed to 80, but it also 
>>>>>>>>>> didn't
>>>>>>>>>> help.
>>>>>>>>>> While looking through the sources of openmpi-1.4.2 I couldn't find 
>>>>>>>>>> any
>>>>>>>>>> call of the function ompi_dpm_base_mark_dyncomm.
>>>>>>>>> 
>>>>>>>>> It isn't directly called - it shows in ompi_comm_set as 
>>>>>>>>> ompi_dpm.mark_dyncomm. You were definitely overrunning that array, 
>>>>>>>>> but I guess something else is also being hit. Have to look further...
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 2010/7/12 Ralph Castain <r...@open-mpi.org>:
>>>>>>>>>>> Just so you don't have to wait for 1.4.3 release, here is the patch 
>>>>>>>>>>> (doesn't include the prior patch).
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> On Jul 12, 2010, at 12:13 PM, Grzegorz Maj wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> 2010/7/12 Ralph Castain <r...@open-mpi.org>:
>>>>>>>>>>>>> Dug around a bit and found the problem!!
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I have no idea who or why this was done, but somebody set a limit 
>>>>>>>>>>>>> of 64 separate jobids in the dynamic init called by 
>>>>>>>>>>>>> ompi_comm_set, which builds the intercommunicator. Unfortunately, 
>>>>>>>>>>>>> they hard-wired the array size, but never check that size before 
>>>>>>>>>>>>> adding to it.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> So after 64 calls to connect_accept, you are overwriting other 
>>>>>>>>>>>>> areas of the code. As you found, hitting 66 causes it to segfault.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I'll fix this on the developer's trunk (I'll also add that 
>>>>>>>>>>>>> original patch to it). Rather than my searching this thread in 
>>>>>>>>>>>>> detail, can you remind me what version you are using so I can 
>>>>>>>>>>>>> patch it too?
>>>>>>>>>>>> 
>>>>>>>>>>>> I'm using 1.4.2
>>>>>>>>>>>> Thanks a lot and I'm looking forward for the patch.
>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Thanks for your patience with this!
>>>>>>>>>>>>> Ralph
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Jul 12, 2010, at 7:20 AM, Grzegorz Maj wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 1024 is not the problem: changing it to 2048 hasn't change 
>>>>>>>>>>>>>> anything.
>>>>>>>>>>>>>> Following your advice I've run my process using gdb. 
>>>>>>>>>>>>>> Unfortunately I
>>>>>>>>>>>>>> didn't get anything more than:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Program received signal SIGSEGV, Segmentation fault.
>>>>>>>>>>>>>> [Switching to Thread 0xf7e4c6c0 (LWP 20246)]
>>>>>>>>>>>>>> 0xf7f39905 in ompi_comm_set () from 
>>>>>>>>>>>>>> /home/gmaj/openmpi/lib/libmpi.so.0
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> (gdb) bt
>>>>>>>>>>>>>> #0  0xf7f39905 in ompi_comm_set () from 
>>>>>>>>>>>>>> /home/gmaj/openmpi/lib/libmpi.so.0
>>>>>>>>>>>>>> #1  0xf7e3ba95 in connect_accept () from
>>>>>>>>>>>>>> /home/gmaj/openmpi/lib/openmpi/mca_dpm_orte.so
>>>>>>>>>>>>>> #2  0xf7f62013 in PMPI_Comm_connect () from 
>>>>>>>>>>>>>> /home/gmaj/openmpi/lib/libmpi.so.0
>>>>>>>>>>>>>> #3  0x080489ed in main (argc=825832753, argv=0x34393638) at 
>>>>>>>>>>>>>> client.c:43
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> What's more: when I've added a breakpoint on ompi_comm_set in 
>>>>>>>>>>>>>> 66th
>>>>>>>>>>>>>> process and stepped a couple of instructions, one of the other
>>>>>>>>>>>>>> processes crashed (as usualy on ompi_comm_set) earlier than 66th 
>>>>>>>>>>>>>> did.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Finally I decided to recompile openmpi using -g flag for gcc. In 
>>>>>>>>>>>>>> this
>>>>>>>>>>>>>> case the 66 processes issue has gone! I was running my 
>>>>>>>>>>>>>> applications
>>>>>>>>>>>>>> exactly the same way as previously (even without recompilation) 
>>>>>>>>>>>>>> and
>>>>>>>>>>>>>> I've run successfully over 130 processes.
>>>>>>>>>>>>>> When switching back to the openmpi compilation without -g it 
>>>>>>>>>>>>>> again segfaults.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Any ideas? I'm really confused.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 2010/7/7 Ralph Castain <r...@open-mpi.org>:
>>>>>>>>>>>>>>> I would guess the #files limit of 1024. However, if it behaves 
>>>>>>>>>>>>>>> the same way when spread across multiple machines, I would 
>>>>>>>>>>>>>>> suspect it is somewhere in your program itself. Given that the 
>>>>>>>>>>>>>>> segfault is in your process, can you use gdb to look at the 
>>>>>>>>>>>>>>> core file and see where and why it fails?
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Jul 7, 2010, at 10:17 AM, Grzegorz Maj wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 2010/7/7 Ralph Castain <r...@open-mpi.org>:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> On Jul 6, 2010, at 8:48 AM, Grzegorz Maj wrote:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Hi Ralph,
>>>>>>>>>>>>>>>>>> sorry for the late response, but I couldn't find free time 
>>>>>>>>>>>>>>>>>> to play
>>>>>>>>>>>>>>>>>> with this. Finally I've applied the patch you prepared. I've 
>>>>>>>>>>>>>>>>>> launched
>>>>>>>>>>>>>>>>>> my processes in the way you've described and I think it's 
>>>>>>>>>>>>>>>>>> working as
>>>>>>>>>>>>>>>>>> you expected. None of my processes runs the orted daemon and 
>>>>>>>>>>>>>>>>>> they can
>>>>>>>>>>>>>>>>>> perform MPI operations. Unfortunately I'm still hitting the 
>>>>>>>>>>>>>>>>>> 65
>>>>>>>>>>>>>>>>>> processes issue :(
>>>>>>>>>>>>>>>>>> Maybe I'm doing something wrong.
>>>>>>>>>>>>>>>>>> I attach my source code. If anybody could have a look on 
>>>>>>>>>>>>>>>>>> this, I would
>>>>>>>>>>>>>>>>>> be grateful.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> When I run that code with clients_count <= 65 everything 
>>>>>>>>>>>>>>>>>> works fine:
>>>>>>>>>>>>>>>>>> all the processes create a common grid, exchange some 
>>>>>>>>>>>>>>>>>> information and
>>>>>>>>>>>>>>>>>> disconnect.
>>>>>>>>>>>>>>>>>> When I set clients_count > 65 the 66th process crashes on
>>>>>>>>>>>>>>>>>> MPI_Comm_connect (segmentation fault).
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> I didn't have time to check the code, but my guess is that 
>>>>>>>>>>>>>>>>> you are still hitting some kind of file descriptor or other 
>>>>>>>>>>>>>>>>> limit. Check to see what your limits are - usually "ulimit" 
>>>>>>>>>>>>>>>>> will tell you.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> My limitations are:
>>>>>>>>>>>>>>>> time(seconds)        unlimited
>>>>>>>>>>>>>>>> file(blocks)         unlimited
>>>>>>>>>>>>>>>> data(kb)             unlimited
>>>>>>>>>>>>>>>> stack(kb)            10240
>>>>>>>>>>>>>>>> coredump(blocks)     0
>>>>>>>>>>>>>>>> memory(kb)           unlimited
>>>>>>>>>>>>>>>> locked memory(kb)    64
>>>>>>>>>>>>>>>> process              200704
>>>>>>>>>>>>>>>> nofiles              1024
>>>>>>>>>>>>>>>> vmemory(kb)          unlimited
>>>>>>>>>>>>>>>> locks                unlimited
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Which one do you think could be responsible for that?
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> I was trying to run all the 66 processes on one machine or 
>>>>>>>>>>>>>>>> spread them
>>>>>>>>>>>>>>>> across several machines and it always crashes the same way on 
>>>>>>>>>>>>>>>> the 66th
>>>>>>>>>>>>>>>> process.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Another thing I would like to know is if it's normal that 
>>>>>>>>>>>>>>>>>> any of my
>>>>>>>>>>>>>>>>>> processes when calling MPI_Comm_connect or MPI_Comm_accept 
>>>>>>>>>>>>>>>>>> when the
>>>>>>>>>>>>>>>>>> other side is not ready, is eating up a full CPU available.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Yes - the waiting process is polling in a tight loop waiting 
>>>>>>>>>>>>>>>>> for the connection to be made.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Any help would be appreciated,
>>>>>>>>>>>>>>>>>> Grzegorz Maj
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 2010/4/24 Ralph Castain <r...@open-mpi.org>:
>>>>>>>>>>>>>>>>>>> Actually, OMPI is distributed with a daemon that does 
>>>>>>>>>>>>>>>>>>> pretty much what you
>>>>>>>>>>>>>>>>>>> want. Checkout "man ompi-server". I originally wrote that 
>>>>>>>>>>>>>>>>>>> code to support
>>>>>>>>>>>>>>>>>>> cross-application MPI publish/subscribe operations, but we 
>>>>>>>>>>>>>>>>>>> can utilize it
>>>>>>>>>>>>>>>>>>> here too. Have to blame me for not making it more publicly 
>>>>>>>>>>>>>>>>>>> known.
>>>>>>>>>>>>>>>>>>> The attached patch upgrades ompi-server and modifies the 
>>>>>>>>>>>>>>>>>>> singleton startup
>>>>>>>>>>>>>>>>>>> to provide your desired support. This solution works in the 
>>>>>>>>>>>>>>>>>>> following
>>>>>>>>>>>>>>>>>>> manner:
>>>>>>>>>>>>>>>>>>> 1. launch "ompi-server -report-uri <filename>". This starts 
>>>>>>>>>>>>>>>>>>> a persistent
>>>>>>>>>>>>>>>>>>> daemon called "ompi-server" that acts as a rendezvous point 
>>>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>> independently started applications.  The problem with 
>>>>>>>>>>>>>>>>>>> starting different
>>>>>>>>>>>>>>>>>>> applications and wanting them to MPI connect/accept lies in 
>>>>>>>>>>>>>>>>>>> the need to have
>>>>>>>>>>>>>>>>>>> the applications find each other. If they can't discover 
>>>>>>>>>>>>>>>>>>> contact info for
>>>>>>>>>>>>>>>>>>> the other app, then they can't wire up their interconnects. 
>>>>>>>>>>>>>>>>>>> The
>>>>>>>>>>>>>>>>>>> "ompi-server" tool provides that rendezvous point. I don't 
>>>>>>>>>>>>>>>>>>> like that
>>>>>>>>>>>>>>>>>>> comm_accept segfaulted - should have just error'd out.
>>>>>>>>>>>>>>>>>>> 2. set OMPI_MCA_orte_server=file:<filename>" in the 
>>>>>>>>>>>>>>>>>>> environment where you
>>>>>>>>>>>>>>>>>>> will start your processes. This will allow your singleton 
>>>>>>>>>>>>>>>>>>> processes to find
>>>>>>>>>>>>>>>>>>> the ompi-server. I automatically also set the envar to 
>>>>>>>>>>>>>>>>>>> connect the MPI
>>>>>>>>>>>>>>>>>>> publish/subscribe system for you.
>>>>>>>>>>>>>>>>>>> 3. run your processes. As they think they are singletons, 
>>>>>>>>>>>>>>>>>>> they will detect
>>>>>>>>>>>>>>>>>>> the presence of the above envar and automatically connect 
>>>>>>>>>>>>>>>>>>> themselves to the
>>>>>>>>>>>>>>>>>>> "ompi-server" daemon. This provides each process with the 
>>>>>>>>>>>>>>>>>>> ability to perform
>>>>>>>>>>>>>>>>>>> any MPI-2 operation.
>>>>>>>>>>>>>>>>>>> I tested this on my machines and it worked, so hopefully it 
>>>>>>>>>>>>>>>>>>> will meet your
>>>>>>>>>>>>>>>>>>> needs. You only need to run one "ompi-server" period, so 
>>>>>>>>>>>>>>>>>>> long as you locate
>>>>>>>>>>>>>>>>>>> it where all of the processes can find the contact file and 
>>>>>>>>>>>>>>>>>>> can open a TCP
>>>>>>>>>>>>>>>>>>> socket to the daemon. There is a way to knit multiple 
>>>>>>>>>>>>>>>>>>> ompi-servers into a
>>>>>>>>>>>>>>>>>>> broader network (e.g., to connect processes that cannot 
>>>>>>>>>>>>>>>>>>> directly access a
>>>>>>>>>>>>>>>>>>> server due to network segmentation), but it's a tad tricky 
>>>>>>>>>>>>>>>>>>> - let me know if
>>>>>>>>>>>>>>>>>>> you require it and I'll try to help.
>>>>>>>>>>>>>>>>>>> If you have trouble wiring them all into a single 
>>>>>>>>>>>>>>>>>>> communicator, you might
>>>>>>>>>>>>>>>>>>> ask separately about that and see if one of our MPI experts 
>>>>>>>>>>>>>>>>>>> can provide
>>>>>>>>>>>>>>>>>>> advice (I'm just the RTE grunt).
>>>>>>>>>>>>>>>>>>> HTH - let me know how this works for you and I'll 
>>>>>>>>>>>>>>>>>>> incorporate it into future
>>>>>>>>>>>>>>>>>>> OMPI releases.
>>>>>>>>>>>>>>>>>>> Ralph
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> On Apr 24, 2010, at 1:49 AM, Krzysztof Zarzycki wrote:
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Hi Ralph,
>>>>>>>>>>>>>>>>>>> I'm Krzysztof and I'm working with Grzegorz Maj on this our 
>>>>>>>>>>>>>>>>>>> small
>>>>>>>>>>>>>>>>>>> project/experiment.
>>>>>>>>>>>>>>>>>>> We definitely would like to give your patch a try. But 
>>>>>>>>>>>>>>>>>>> could you please
>>>>>>>>>>>>>>>>>>> explain your solution a little more?
>>>>>>>>>>>>>>>>>>> You still would like to start one mpirun per mpi grid, and 
>>>>>>>>>>>>>>>>>>> then have
>>>>>>>>>>>>>>>>>>> processes started by us to join the MPI comm?
>>>>>>>>>>>>>>>>>>> It is a good solution of course.
>>>>>>>>>>>>>>>>>>> But it would be especially preferable to have one daemon 
>>>>>>>>>>>>>>>>>>> running
>>>>>>>>>>>>>>>>>>> persistently on our "entry" machine that can handle several 
>>>>>>>>>>>>>>>>>>> mpi grid starts.
>>>>>>>>>>>>>>>>>>> Can your patch help us this way too?
>>>>>>>>>>>>>>>>>>> Thanks for your help!
>>>>>>>>>>>>>>>>>>> Krzysztof
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> On 24 April 2010 03:51, Ralph Castain <r...@open-mpi.org> 
>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> In thinking about this, my proposed solution won't 
>>>>>>>>>>>>>>>>>>>> entirely fix the
>>>>>>>>>>>>>>>>>>>> problem - you'll still wind up with all those daemons. I 
>>>>>>>>>>>>>>>>>>>> believe I can
>>>>>>>>>>>>>>>>>>>> resolve that one as well, but it would require a patch.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Would you like me to send you something you could try? 
>>>>>>>>>>>>>>>>>>>> Might take a couple
>>>>>>>>>>>>>>>>>>>> of iterations to get it right...
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> On Apr 23, 2010, at 12:12 PM, Ralph Castain wrote:
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Hmmm....I -think- this will work, but I cannot guarantee 
>>>>>>>>>>>>>>>>>>>>> it:
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 1. launch one process (can just be a spinner) using 
>>>>>>>>>>>>>>>>>>>>> mpirun that includes
>>>>>>>>>>>>>>>>>>>>> the following option:
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> mpirun -report-uri file
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> where file is some filename that mpirun can create and 
>>>>>>>>>>>>>>>>>>>>> insert its
>>>>>>>>>>>>>>>>>>>>> contact info into it. This can be a relative or absolute 
>>>>>>>>>>>>>>>>>>>>> path. This process
>>>>>>>>>>>>>>>>>>>>> must remain alive throughout your application - doesn't 
>>>>>>>>>>>>>>>>>>>>> matter what it does.
>>>>>>>>>>>>>>>>>>>>> It's purpose is solely to keep mpirun alive.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 2. set OMPI_MCA_dpm_orte_server=FILE:file in your 
>>>>>>>>>>>>>>>>>>>>> environment, where
>>>>>>>>>>>>>>>>>>>>> "file" is the filename given above. This will tell your 
>>>>>>>>>>>>>>>>>>>>> processes how to
>>>>>>>>>>>>>>>>>>>>> find mpirun, which is acting as a meeting place to handle 
>>>>>>>>>>>>>>>>>>>>> the connect/accept
>>>>>>>>>>>>>>>>>>>>> operations
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Now run your processes, and have them connect/accept to 
>>>>>>>>>>>>>>>>>>>>> each other.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> The reason I cannot guarantee this will work is that 
>>>>>>>>>>>>>>>>>>>>> these processes
>>>>>>>>>>>>>>>>>>>>> will all have the same rank && name since they all start 
>>>>>>>>>>>>>>>>>>>>> as singletons.
>>>>>>>>>>>>>>>>>>>>> Hence, connect/accept is likely to fail.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> But it -might- work, so you might want to give it a try.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> On Apr 23, 2010, at 8:10 AM, Grzegorz Maj wrote:
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> To be more precise: by 'server process' I mean some 
>>>>>>>>>>>>>>>>>>>>>> process that I
>>>>>>>>>>>>>>>>>>>>>> could run once on my system and it could help in 
>>>>>>>>>>>>>>>>>>>>>> creating those
>>>>>>>>>>>>>>>>>>>>>> groups.
>>>>>>>>>>>>>>>>>>>>>> My typical scenario is:
>>>>>>>>>>>>>>>>>>>>>> 1. run N separate processes, each without mpirun
>>>>>>>>>>>>>>>>>>>>>> 2. connect them into MPI group
>>>>>>>>>>>>>>>>>>>>>> 3. do some job
>>>>>>>>>>>>>>>>>>>>>> 4. exit all N processes
>>>>>>>>>>>>>>>>>>>>>> 5. goto 1
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> 2010/4/23 Grzegorz Maj <ma...@wp.pl>:
>>>>>>>>>>>>>>>>>>>>>>> Thank you Ralph for your explanation.
>>>>>>>>>>>>>>>>>>>>>>> And, apart from that descriptors' issue, is there any 
>>>>>>>>>>>>>>>>>>>>>>> other way to
>>>>>>>>>>>>>>>>>>>>>>> solve my problem, i.e. to run separately a number of 
>>>>>>>>>>>>>>>>>>>>>>> processes,
>>>>>>>>>>>>>>>>>>>>>>> without mpirun and then to collect them into an MPI 
>>>>>>>>>>>>>>>>>>>>>>> intracomm group?
>>>>>>>>>>>>>>>>>>>>>>> If I for example would need to run some 'server 
>>>>>>>>>>>>>>>>>>>>>>> process' (even using
>>>>>>>>>>>>>>>>>>>>>>> mpirun) for this task, that's OK. Any ideas?
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>> Grzegorz Maj
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 2010/4/18 Ralph Castain <r...@open-mpi.org>:
>>>>>>>>>>>>>>>>>>>>>>>> Okay, but here is the problem. If you don't use 
>>>>>>>>>>>>>>>>>>>>>>>> mpirun, and are not
>>>>>>>>>>>>>>>>>>>>>>>> operating in an environment we support for "direct" 
>>>>>>>>>>>>>>>>>>>>>>>> launch (i.e., starting
>>>>>>>>>>>>>>>>>>>>>>>> processes outside of mpirun), then every one of those 
>>>>>>>>>>>>>>>>>>>>>>>> processes thinks it is
>>>>>>>>>>>>>>>>>>>>>>>> a singleton - yes?
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> What you may not realize is that each singleton 
>>>>>>>>>>>>>>>>>>>>>>>> immediately
>>>>>>>>>>>>>>>>>>>>>>>> fork/exec's an orted daemon that is configured to 
>>>>>>>>>>>>>>>>>>>>>>>> behave just like mpirun.
>>>>>>>>>>>>>>>>>>>>>>>> This is required in order to support MPI-2 operations 
>>>>>>>>>>>>>>>>>>>>>>>> such as
>>>>>>>>>>>>>>>>>>>>>>>> MPI_Comm_spawn, MPI_Comm_connect/accept, etc.
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> So if you launch 64 processes that think they are 
>>>>>>>>>>>>>>>>>>>>>>>> singletons, then
>>>>>>>>>>>>>>>>>>>>>>>> you have 64 copies of orted running as well. This eats 
>>>>>>>>>>>>>>>>>>>>>>>> up a lot of file
>>>>>>>>>>>>>>>>>>>>>>>> descriptors, which is probably why you are hitting 
>>>>>>>>>>>>>>>>>>>>>>>> this 65 process limit -
>>>>>>>>>>>>>>>>>>>>>>>> your system is probably running out of file 
>>>>>>>>>>>>>>>>>>>>>>>> descriptors. You might check you
>>>>>>>>>>>>>>>>>>>>>>>> system limits and see if you can get them revised 
>>>>>>>>>>>>>>>>>>>>>>>> upward.
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> On Apr 17, 2010, at 4:24 PM, Grzegorz Maj wrote:
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> Yes, I know. The problem is that I need to use some 
>>>>>>>>>>>>>>>>>>>>>>>>> special way for
>>>>>>>>>>>>>>>>>>>>>>>>> running my processes provided by the environment in 
>>>>>>>>>>>>>>>>>>>>>>>>> which I'm
>>>>>>>>>>>>>>>>>>>>>>>>> working
>>>>>>>>>>>>>>>>>>>>>>>>> and unfortunately I can't use mpirun.
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> 2010/4/18 Ralph Castain <r...@open-mpi.org>:
>>>>>>>>>>>>>>>>>>>>>>>>>> Guess I don't understand why you can't use mpirun - 
>>>>>>>>>>>>>>>>>>>>>>>>>> all it does is
>>>>>>>>>>>>>>>>>>>>>>>>>> start things, provide a means to forward io, etc. It 
>>>>>>>>>>>>>>>>>>>>>>>>>> mainly sits there
>>>>>>>>>>>>>>>>>>>>>>>>>> quietly without using any cpu unless required to 
>>>>>>>>>>>>>>>>>>>>>>>>>> support the job.
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> Sounds like it would solve your problem. Otherwise, 
>>>>>>>>>>>>>>>>>>>>>>>>>> I know of no
>>>>>>>>>>>>>>>>>>>>>>>>>> way to get all these processes into comm_world.
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> On Apr 17, 2010, at 2:27 PM, Grzegorz Maj wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>>>>>>>>>> I'd like to dynamically create a group of processes 
>>>>>>>>>>>>>>>>>>>>>>>>>>> communicating
>>>>>>>>>>>>>>>>>>>>>>>>>>> via
>>>>>>>>>>>>>>>>>>>>>>>>>>> MPI. Those processes need to be run without mpirun 
>>>>>>>>>>>>>>>>>>>>>>>>>>> and create
>>>>>>>>>>>>>>>>>>>>>>>>>>> intracommunicator after the startup. Any ideas how 
>>>>>>>>>>>>>>>>>>>>>>>>>>> to do this
>>>>>>>>>>>>>>>>>>>>>>>>>>> efficiently?
>>>>>>>>>>>>>>>>>>>>>>>>>>> I came up with a solution in which the processes 
>>>>>>>>>>>>>>>>>>>>>>>>>>> are connecting
>>>>>>>>>>>>>>>>>>>>>>>>>>> one by
>>>>>>>>>>>>>>>>>>>>>>>>>>> one using MPI_Comm_connect, but unfortunately all 
>>>>>>>>>>>>>>>>>>>>>>>>>>> the processes
>>>>>>>>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>>>>>>> are already in the group need to call 
>>>>>>>>>>>>>>>>>>>>>>>>>>> MPI_Comm_accept. This means
>>>>>>>>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>>>>>>> when the n-th process wants to connect I need to 
>>>>>>>>>>>>>>>>>>>>>>>>>>> collect all the
>>>>>>>>>>>>>>>>>>>>>>>>>>> n-1
>>>>>>>>>>>>>>>>>>>>>>>>>>> processes on the MPI_Comm_accept call. After I run 
>>>>>>>>>>>>>>>>>>>>>>>>>>> about 40
>>>>>>>>>>>>>>>>>>>>>>>>>>> processes
>>>>>>>>>>>>>>>>>>>>>>>>>>> every subsequent call takes more and more time, 
>>>>>>>>>>>>>>>>>>>>>>>>>>> which I'd like to
>>>>>>>>>>>>>>>>>>>>>>>>>>> avoid.
>>>>>>>>>>>>>>>>>>>>>>>>>>> Another problem in this solution is that when I try 
>>>>>>>>>>>>>>>>>>>>>>>>>>> to connect
>>>>>>>>>>>>>>>>>>>>>>>>>>> 66-th
>>>>>>>>>>>>>>>>>>>>>>>>>>> process the root of the existing group segfaults on
>>>>>>>>>>>>>>>>>>>>>>>>>>> MPI_Comm_accept.
>>>>>>>>>>>>>>>>>>>>>>>>>>> Maybe it's my bug, but it's weird as everything 
>>>>>>>>>>>>>>>>>>>>>>>>>>> works fine for at
>>>>>>>>>>>>>>>>>>>>>>>>>>> most
>>>>>>>>>>>>>>>>>>>>>>>>>>> 65 processes. Is there any limitation I don't know 
>>>>>>>>>>>>>>>>>>>>>>>>>>> about?
>>>>>>>>>>>>>>>>>>>>>>>>>>> My last question is about MPI_COMM_WORLD. When I 
>>>>>>>>>>>>>>>>>>>>>>>>>>> run my processes
>>>>>>>>>>>>>>>>>>>>>>>>>>> without mpirun their MPI_COMM_WORLD is the same as 
>>>>>>>>>>>>>>>>>>>>>>>>>>> MPI_COMM_SELF.
>>>>>>>>>>>>>>>>>>>>>>>>>>> Is
>>>>>>>>>>>>>>>>>>>>>>>>>>> there any way to change MPI_COMM_WORLD and set it 
>>>>>>>>>>>>>>>>>>>>>>>>>>> to the
>>>>>>>>>>>>>>>>>>>>>>>>>>> intracommunicator that I've created?
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>> Grzegorz Maj
>>>>>>>>>>>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>>>>>>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>>>>>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>>>>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>>>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> <client.c><server.c>_______________________________________________
>>>>>>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> users mailing list
>>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> users mailing list
>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> _______________________________________________
>>>>>>>>>> users mailing list
>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> _______________________________________________
>>>>>>>> users mailing list
>>>>>>>> us...@open-mpi.org
>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>> 
>>>>>>> 
>>>>>>> _______________________________________________
>>>>>>> users mailing list
>>>>>>> us...@open-mpi.org
>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>> 
>>>>>> 
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> us...@open-mpi.org
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>> 
>>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> us...@open-mpi.org
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> 
>>>> 
>>>> _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> 
>>>> 
>>> 
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] Dynamic processes connection and segfault on MPI_Comm_accept

Reply via email to