Re: [OMPI users] Dynamic processes connection and segfault on MPI_Comm_accept

Edgar Gabriel Tue, 27 Jul 2010 15:30:18 -0400

based on your output shown here, there is absolutely nothing wrong
(yet). Both processes are in the same function and do what they are
supposed to do.


However, I am fairly sure that the client process bt that you show is
already part of current_intracomm. Could you try to create a bt of the
process that is not yet part of current_intracomm (If I understand your
code correctly, the intercommunicator is n-1 configuration, with each
client process being part of n after the intercomm_merge). It would be
interesting to see where that process is...

Thanks
Edgar

On 7/27/2010 1:42 PM, Ralph Castain wrote:
> This slides outside of my purview - I would suggest you post this question 
> with a different subject line specifically mentioning failure of 
> intercomm_merge to work so it attracts the attention of those with knowledge 
> of that area.
> 
> 
> On Jul 27, 2010, at 9:30 AM, Grzegorz Maj wrote:
> 
>> So now I have a new question.
>> When I run my server and a lot of clients on the same machine,
>> everything looks fine.
>>
>> But when I try to run the clients on several machines the most
>> frequent scenario is:
>> * server is stared on machine A
>> * X (= 1, 4, 10, ..) clients are started on machine B and they connect
>> successfully
>> * the first client starting on machine C connects successfully to the
>> server, but the whole grid hangs on MPI_Comm_merge (all the processes
>> from intercommunicator get there).
>>
>> As I said it's the most frequent scenario. Sometimes I can connect the
>> clients from several machines. Sometimes it hangs (always on
>> MPI_Comm_merge) when connecting the clients from machine B.
>> The interesting thing is, that if before MPI_Comm_merge I send a dummy
>> message on the intercommunicator from process rank 0 in one group to
>> process rank 0 in the other one, it will not hang on MPI_Comm_merge.
>>
>> I've tried both versions with and without the first patch (ompi-server
>> as orted) but it doesn't change the behavior.
>>
>> I've attached gdb to my server, this is bt:
>> #0  0xffffe410 in __kernel_vsyscall ()
>> #1  0x00637afc in sched_yield () from /lib/libc.so.6
>> #2  0xf7e8ce31 in opal_progress () at ../../opal/runtime/opal_progress.c:220
>> #3  0xf7f60ad4 in opal_condition_wait (c=0xf7fd7dc0, m=0xf7fd7e00) at
>> ../../opal/threads/condition.h:99
>> #4  0xf7f60dee in ompi_request_default_wait_all (count=2,
>> requests=0xff8d7754, statuses=0x0) at
>> ../../ompi/request/req_wait.c:262
>> #5  0xf7d3e221 in mca_coll_inter_allgatherv_inter (sbuf=0xff8d7824,
>> scount=1, sdtype=0x8049200, rbuf=0xff8d77e0, rcounts=0x9783df8,
>> disps=0x9755520, rdtype=0x8049200, comm=0x978c2a8, module=0x9794b08)
>>    at ../../../../../ompi/mca/coll/inter/coll_inter_allgatherv.c:127
>> #6  0xf7f4c615 in ompi_comm_determine_first (intercomm=0x978c2a8,
>> high=0) at ../../ompi/communicator/comm.c:1199
>> #7  0xf7f8d1d9 in PMPI_Intercomm_merge (intercomm=0x978c2a8, high=0,
>> newcomm=0xff8d78c0) at pintercomm_merge.c:84
>> #8  0x0804893c in main (argc=Cannot access memory at address 0xf
>> ) at server.c:50
>>
>> And this is bt from one of the clients:
>> #0  0xffffe410 in __kernel_vsyscall ()
>> #1  0x0064993b in poll () from /lib/libc.so.6
>> #2  0xf7de027f in poll_dispatch (base=0x8643fb8, arg=0x86442d8,
>> tv=0xff82299c) at ../../../opal/event/poll.c:168
>> #3  0xf7dde4b2 in opal_event_base_loop (base=0x8643fb8, flags=2) at
>> ../../../opal/event/event.c:807
>> #4  0xf7dde34f in opal_event_loop (flags=2) at 
>> ../../../opal/event/event.c:730
>> #5  0xf7dcfc77 in opal_progress () at ../../opal/runtime/opal_progress.c:189
>> #6  0xf7ea80b8 in opal_condition_wait (c=0xf7f25160, m=0xf7f251a0) at
>> ../../opal/threads/condition.h:99
>> #7  0xf7ea7ff3 in ompi_request_wait_completion (req=0x8686680) at
>> ../../ompi/request/request.h:375
>> #8  0xf7ea7ef1 in ompi_request_default_wait (req_ptr=0xff822ae8,
>> status=0x0) at ../../ompi/request/req_wait.c:37
>> #9  0xf7c663a6 in ompi_coll_tuned_bcast_intra_generic
>> (buffer=0xff822d20, original_count=1, datatype=0x868bd00, root=0,
>> comm=0x86aa7f8, module=0x868b700, count_by_segment=1, tree=0x868b3d8)
>>    at ../../../../../ompi/mca/coll/tuned/coll_tuned_bcast.c:237
>> #10 0xf7c668ea in ompi_coll_tuned_bcast_intra_binomial
>> (buffer=0xff822d20, count=1, datatype=0x868bd00, root=0,
>> comm=0x86aa7f8, module=0x868b700, segsize=0)
>>    at ../../../../../ompi/mca/coll/tuned/coll_tuned_bcast.c:368
>> #11 0xf7c5af12 in ompi_coll_tuned_bcast_intra_dec_fixed
>> (buff=0xff822d20, count=1, datatype=0x868bd00, root=0, comm=0x86aa7f8,
>> module=0x868b700)
>>    at ../../../../../ompi/mca/coll/tuned/coll_tuned_decision_fixed.c:256
>> #12 0xf7c73269 in mca_coll_sync_bcast (buff=0xff822d20, count=1,
>> datatype=0x868bd00, root=0, comm=0x86aa7f8, module=0x86aaa28) at
>> ../../../../../ompi/mca/coll/sync/coll_sync_bcast.c:44
>> #13 0xf7c80381 in mca_coll_inter_allgatherv_inter (sbuf=0xff822d64,
>> scount=0, sdtype=0x8049400, rbuf=0xff822d20, rcounts=0x868a188,
>> disps=0x868abb8, rdtype=0x8049400, comm=0x86aa300,
>>    module=0x86aae18) at
>> ../../../../../ompi/mca/coll/inter/coll_inter_allgatherv.c:134
>> #14 0xf7e9398f in ompi_comm_determine_first (intercomm=0x86aa300,
>> high=0) at ../../ompi/communicator/comm.c:1199
>> #15 0xf7ed7833 in PMPI_Intercomm_merge (intercomm=0x86aa300, high=0,
>> newcomm=0xff8241d0) at pintercomm_merge.c:84
>> #16 0x08048afd in main (argc=943274038, argv=0x33393133) at client.c:47
>>
>>
>>
>> What do you think may cause the problem?
>>
>>
>> 2010/7/26 Ralph Castain <r...@open-mpi.org>:
>>> No problem at all - glad it works!
>>>
>>> On Jul 26, 2010, at 7:58 AM, Grzegorz Maj wrote:
>>>
>>>> Hi,
>>>> I'm very sorry, but the problem was on my side. My installation
>>>> process was not always taking the newest sources of openmpi. In this
>>>> case it hasn't installed the version with the latest patch. Now I
>>>> think everything works fine - I could run over 130 processes with no
>>>> problems.
>>>> I'm sorry again that I've wasted your time. And thank you for the patch.
>>>>
>>>> 2010/7/21 Ralph Castain <r...@open-mpi.org>:
>>>>> We're having some problem replicating this once my patches are applied. 
>>>>> Can you send us your configure cmd? Just the output from "head 
>>>>> config.log" will do for now.
>>>>>
>>>>> Thanks!
>>>>>
>>>>> On Jul 20, 2010, at 9:09 AM, Grzegorz Maj wrote:
>>>>>
>>>>>> My start script looks almost exactly the same as the one published by
>>>>>> Edgar, ie. the processes are starting one by one with no delay.
>>>>>>
>>>>>> 2010/7/20 Ralph Castain <r...@open-mpi.org>:
>>>>>>> Grzegorz: something occurred to me. When you start all these processes, 
>>>>>>> how are you staggering their wireup? Are they flooding us, or are you 
>>>>>>> time-shifting them a little?
>>>>>>>
>>>>>>>
>>>>>>> On Jul 19, 2010, at 10:32 AM, Edgar Gabriel wrote:
>>>>>>>
>>>>>>>> Hm, so I am not sure how to approach this. First of all, the test case
>>>>>>>> works for me. I used up to 80 clients, and for both optimized and
>>>>>>>> non-optimized compilation. I ran the tests with trunk (not with 1.4
>>>>>>>> series, but the communicator code is identical in both cases). Clearly,
>>>>>>>> the patch from Ralph is necessary to make it work.
>>>>>>>>
>>>>>>>> Additionally, I went through the communicator creation code for dynamic
>>>>>>>> communicators trying to find spots that could create problems. The only
>>>>>>>> place that I found the number 64 appear is the fortran-to-c mapping
>>>>>>>> arrays (e.g. for communicators), where the initial size of the table is
>>>>>>>> 64. I looked twice over the pointer-array code to see whether we could
>>>>>>>> have a problem their (since it is a key-piece of the cid allocation 
>>>>>>>> code
>>>>>>>> for communicators), but I am fairly confident that it is correct.
>>>>>>>>
>>>>>>>> Note, that we have other (non-dynamic tests), were comm_set is called
>>>>>>>> 100,000 times, and the code per se does not seem to have a problem due
>>>>>>>> to being called too often. So I am not sure what else to look at.
>>>>>>>>
>>>>>>>> Edgar
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 7/13/2010 8:42 PM, Ralph Castain wrote:
>>>>>>>>> As far as I can tell, it appears the problem is somewhere in our 
>>>>>>>>> communicator setup. The people knowledgeable on that area are going 
>>>>>>>>> to look into it later this week.
>>>>>>>>>
>>>>>>>>> I'm creating a ticket to track the problem and will copy you on it.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Jul 13, 2010, at 6:57 AM, Ralph Castain wrote:
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Jul 13, 2010, at 3:36 AM, Grzegorz Maj wrote:
>>>>>>>>>>
>>>>>>>>>>> Bad news..
>>>>>>>>>>> I've tried the latest patch with and without the prior one, but it
>>>>>>>>>>> hasn't changed anything. I've also tried using the old code but with
>>>>>>>>>>> the OMPI_DPM_BASE_MAXJOBIDS constant changed to 80, but it also 
>>>>>>>>>>> didn't
>>>>>>>>>>> help.
>>>>>>>>>>> While looking through the sources of openmpi-1.4.2 I couldn't find 
>>>>>>>>>>> any
>>>>>>>>>>> call of the function ompi_dpm_base_mark_dyncomm.
>>>>>>>>>>
>>>>>>>>>> It isn't directly called - it shows in ompi_comm_set as 
>>>>>>>>>> ompi_dpm.mark_dyncomm. You were definitely overrunning that array, 
>>>>>>>>>> but I guess something else is also being hit. Have to look further...
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> 2010/7/12 Ralph Castain <r...@open-mpi.org>:
>>>>>>>>>>>> Just so you don't have to wait for 1.4.3 release, here is the 
>>>>>>>>>>>> patch (doesn't include the prior patch).
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Jul 12, 2010, at 12:13 PM, Grzegorz Maj wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> 2010/7/12 Ralph Castain <r...@open-mpi.org>:
>>>>>>>>>>>>>> Dug around a bit and found the problem!!
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I have no idea who or why this was done, but somebody set a 
>>>>>>>>>>>>>> limit of 64 separate jobids in the dynamic init called by 
>>>>>>>>>>>>>> ompi_comm_set, which builds the intercommunicator. 
>>>>>>>>>>>>>> Unfortunately, they hard-wired the array size, but never check 
>>>>>>>>>>>>>> that size before adding to it.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> So after 64 calls to connect_accept, you are overwriting other 
>>>>>>>>>>>>>> areas of the code. As you found, hitting 66 causes it to 
>>>>>>>>>>>>>> segfault.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I'll fix this on the developer's trunk (I'll also add that 
>>>>>>>>>>>>>> original patch to it). Rather than my searching this thread in 
>>>>>>>>>>>>>> detail, can you remind me what version you are using so I can 
>>>>>>>>>>>>>> patch it too?
>>>>>>>>>>>>>
>>>>>>>>>>>>> I'm using 1.4.2
>>>>>>>>>>>>> Thanks a lot and I'm looking forward for the patch.
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks for your patience with this!
>>>>>>>>>>>>>> Ralph
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Jul 12, 2010, at 7:20 AM, Grzegorz Maj wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 1024 is not the problem: changing it to 2048 hasn't change 
>>>>>>>>>>>>>>> anything.
>>>>>>>>>>>>>>> Following your advice I've run my process using gdb. 
>>>>>>>>>>>>>>> Unfortunately I
>>>>>>>>>>>>>>> didn't get anything more than:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Program received signal SIGSEGV, Segmentation fault.
>>>>>>>>>>>>>>> [Switching to Thread 0xf7e4c6c0 (LWP 20246)]
>>>>>>>>>>>>>>> 0xf7f39905 in ompi_comm_set () from 
>>>>>>>>>>>>>>> /home/gmaj/openmpi/lib/libmpi.so.0
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> (gdb) bt
>>>>>>>>>>>>>>> #0  0xf7f39905 in ompi_comm_set () from 
>>>>>>>>>>>>>>> /home/gmaj/openmpi/lib/libmpi.so.0
>>>>>>>>>>>>>>> #1  0xf7e3ba95 in connect_accept () from
>>>>>>>>>>>>>>> /home/gmaj/openmpi/lib/openmpi/mca_dpm_orte.so
>>>>>>>>>>>>>>> #2  0xf7f62013 in PMPI_Comm_connect () from 
>>>>>>>>>>>>>>> /home/gmaj/openmpi/lib/libmpi.so.0
>>>>>>>>>>>>>>> #3  0x080489ed in main (argc=825832753, argv=0x34393638) at 
>>>>>>>>>>>>>>> client.c:43
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> What's more: when I've added a breakpoint on ompi_comm_set in 
>>>>>>>>>>>>>>> 66th
>>>>>>>>>>>>>>> process and stepped a couple of instructions, one of the other
>>>>>>>>>>>>>>> processes crashed (as usualy on ompi_comm_set) earlier than 
>>>>>>>>>>>>>>> 66th did.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Finally I decided to recompile openmpi using -g flag for gcc. 
>>>>>>>>>>>>>>> In this
>>>>>>>>>>>>>>> case the 66 processes issue has gone! I was running my 
>>>>>>>>>>>>>>> applications
>>>>>>>>>>>>>>> exactly the same way as previously (even without recompilation) 
>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>> I've run successfully over 130 processes.
>>>>>>>>>>>>>>> When switching back to the openmpi compilation without -g it 
>>>>>>>>>>>>>>> again segfaults.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Any ideas? I'm really confused.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2010/7/7 Ralph Castain <r...@open-mpi.org>:
>>>>>>>>>>>>>>>> I would guess the #files limit of 1024. However, if it behaves 
>>>>>>>>>>>>>>>> the same way when spread across multiple machines, I would 
>>>>>>>>>>>>>>>> suspect it is somewhere in your program itself. Given that the 
>>>>>>>>>>>>>>>> segfault is in your process, can you use gdb to look at the 
>>>>>>>>>>>>>>>> core file and see where and why it fails?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Jul 7, 2010, at 10:17 AM, Grzegorz Maj wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 2010/7/7 Ralph Castain <r...@open-mpi.org>:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Jul 6, 2010, at 8:48 AM, Grzegorz Maj wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Hi Ralph,
>>>>>>>>>>>>>>>>>>> sorry for the late response, but I couldn't find free time 
>>>>>>>>>>>>>>>>>>> to play
>>>>>>>>>>>>>>>>>>> with this. Finally I've applied the patch you prepared. 
>>>>>>>>>>>>>>>>>>> I've launched
>>>>>>>>>>>>>>>>>>> my processes in the way you've described and I think it's 
>>>>>>>>>>>>>>>>>>> working as
>>>>>>>>>>>>>>>>>>> you expected. None of my processes runs the orted daemon 
>>>>>>>>>>>>>>>>>>> and they can
>>>>>>>>>>>>>>>>>>> perform MPI operations. Unfortunately I'm still hitting the 
>>>>>>>>>>>>>>>>>>> 65
>>>>>>>>>>>>>>>>>>> processes issue :(
>>>>>>>>>>>>>>>>>>> Maybe I'm doing something wrong.
>>>>>>>>>>>>>>>>>>> I attach my source code. If anybody could have a look on 
>>>>>>>>>>>>>>>>>>> this, I would
>>>>>>>>>>>>>>>>>>> be grateful.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> When I run that code with clients_count <= 65 everything 
>>>>>>>>>>>>>>>>>>> works fine:
>>>>>>>>>>>>>>>>>>> all the processes create a common grid, exchange some 
>>>>>>>>>>>>>>>>>>> information and
>>>>>>>>>>>>>>>>>>> disconnect.
>>>>>>>>>>>>>>>>>>> When I set clients_count > 65 the 66th process crashes on
>>>>>>>>>>>>>>>>>>> MPI_Comm_connect (segmentation fault).
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I didn't have time to check the code, but my guess is that 
>>>>>>>>>>>>>>>>>> you are still hitting some kind of file descriptor or other 
>>>>>>>>>>>>>>>>>> limit. Check to see what your limits are - usually "ulimit" 
>>>>>>>>>>>>>>>>>> will tell you.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> My limitations are:
>>>>>>>>>>>>>>>>> time(seconds)        unlimited
>>>>>>>>>>>>>>>>> file(blocks)         unlimited
>>>>>>>>>>>>>>>>> data(kb)             unlimited
>>>>>>>>>>>>>>>>> stack(kb)            10240
>>>>>>>>>>>>>>>>> coredump(blocks)     0
>>>>>>>>>>>>>>>>> memory(kb)           unlimited
>>>>>>>>>>>>>>>>> locked memory(kb)    64
>>>>>>>>>>>>>>>>> process              200704
>>>>>>>>>>>>>>>>> nofiles              1024
>>>>>>>>>>>>>>>>> vmemory(kb)          unlimited
>>>>>>>>>>>>>>>>> locks                unlimited
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Which one do you think could be responsible for that?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I was trying to run all the 66 processes on one machine or 
>>>>>>>>>>>>>>>>> spread them
>>>>>>>>>>>>>>>>> across several machines and it always crashes the same way on 
>>>>>>>>>>>>>>>>> the 66th
>>>>>>>>>>>>>>>>> process.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Another thing I would like to know is if it's normal that 
>>>>>>>>>>>>>>>>>>> any of my
>>>>>>>>>>>>>>>>>>> processes when calling MPI_Comm_connect or MPI_Comm_accept 
>>>>>>>>>>>>>>>>>>> when the
>>>>>>>>>>>>>>>>>>> other side is not ready, is eating up a full CPU available.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Yes - the waiting process is polling in a tight loop waiting 
>>>>>>>>>>>>>>>>>> for the connection to be made.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Any help would be appreciated,
>>>>>>>>>>>>>>>>>>> Grzegorz Maj
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> 2010/4/24 Ralph Castain <r...@open-mpi.org>:
>>>>>>>>>>>>>>>>>>>> Actually, OMPI is distributed with a daemon that does 
>>>>>>>>>>>>>>>>>>>> pretty much what you
>>>>>>>>>>>>>>>>>>>> want. Checkout "man ompi-server". I originally wrote that 
>>>>>>>>>>>>>>>>>>>> code to support
>>>>>>>>>>>>>>>>>>>> cross-application MPI publish/subscribe operations, but we 
>>>>>>>>>>>>>>>>>>>> can utilize it
>>>>>>>>>>>>>>>>>>>> here too. Have to blame me for not making it more publicly 
>>>>>>>>>>>>>>>>>>>> known.
>>>>>>>>>>>>>>>>>>>> The attached patch upgrades ompi-server and modifies the 
>>>>>>>>>>>>>>>>>>>> singleton startup
>>>>>>>>>>>>>>>>>>>> to provide your desired support. This solution works in 
>>>>>>>>>>>>>>>>>>>> the following
>>>>>>>>>>>>>>>>>>>> manner:
>>>>>>>>>>>>>>>>>>>> 1. launch "ompi-server -report-uri <filename>". This 
>>>>>>>>>>>>>>>>>>>> starts a persistent
>>>>>>>>>>>>>>>>>>>> daemon called "ompi-server" that acts as a rendezvous 
>>>>>>>>>>>>>>>>>>>> point for
>>>>>>>>>>>>>>>>>>>> independently started applications.  The problem with 
>>>>>>>>>>>>>>>>>>>> starting different
>>>>>>>>>>>>>>>>>>>> applications and wanting them to MPI connect/accept lies 
>>>>>>>>>>>>>>>>>>>> in the need to have
>>>>>>>>>>>>>>>>>>>> the applications find each other. If they can't discover 
>>>>>>>>>>>>>>>>>>>> contact info for
>>>>>>>>>>>>>>>>>>>> the other app, then they can't wire up their 
>>>>>>>>>>>>>>>>>>>> interconnects. The
>>>>>>>>>>>>>>>>>>>> "ompi-server" tool provides that rendezvous point. I don't 
>>>>>>>>>>>>>>>>>>>> like that
>>>>>>>>>>>>>>>>>>>> comm_accept segfaulted - should have just error'd out.
>>>>>>>>>>>>>>>>>>>> 2. set OMPI_MCA_orte_server=file:<filename>" in the 
>>>>>>>>>>>>>>>>>>>> environment where you
>>>>>>>>>>>>>>>>>>>> will start your processes. This will allow your singleton 
>>>>>>>>>>>>>>>>>>>> processes to find
>>>>>>>>>>>>>>>>>>>> the ompi-server. I automatically also set the envar to 
>>>>>>>>>>>>>>>>>>>> connect the MPI
>>>>>>>>>>>>>>>>>>>> publish/subscribe system for you.
>>>>>>>>>>>>>>>>>>>> 3. run your processes. As they think they are singletons, 
>>>>>>>>>>>>>>>>>>>> they will detect
>>>>>>>>>>>>>>>>>>>> the presence of the above envar and automatically connect 
>>>>>>>>>>>>>>>>>>>> themselves to the
>>>>>>>>>>>>>>>>>>>> "ompi-server" daemon. This provides each process with the 
>>>>>>>>>>>>>>>>>>>> ability to perform
>>>>>>>>>>>>>>>>>>>> any MPI-2 operation.
>>>>>>>>>>>>>>>>>>>> I tested this on my machines and it worked, so hopefully 
>>>>>>>>>>>>>>>>>>>> it will meet your
>>>>>>>>>>>>>>>>>>>> needs. You only need to run one "ompi-server" period, so 
>>>>>>>>>>>>>>>>>>>> long as you locate
>>>>>>>>>>>>>>>>>>>> it where all of the processes can find the contact file 
>>>>>>>>>>>>>>>>>>>> and can open a TCP
>>>>>>>>>>>>>>>>>>>> socket to the daemon. There is a way to knit multiple 
>>>>>>>>>>>>>>>>>>>> ompi-servers into a
>>>>>>>>>>>>>>>>>>>> broader network (e.g., to connect processes that cannot 
>>>>>>>>>>>>>>>>>>>> directly access a
>>>>>>>>>>>>>>>>>>>> server due to network segmentation), but it's a tad tricky 
>>>>>>>>>>>>>>>>>>>> - let me know if
>>>>>>>>>>>>>>>>>>>> you require it and I'll try to help.
>>>>>>>>>>>>>>>>>>>> If you have trouble wiring them all into a single 
>>>>>>>>>>>>>>>>>>>> communicator, you might
>>>>>>>>>>>>>>>>>>>> ask separately about that and see if one of our MPI 
>>>>>>>>>>>>>>>>>>>> experts can provide
>>>>>>>>>>>>>>>>>>>> advice (I'm just the RTE grunt).
>>>>>>>>>>>>>>>>>>>> HTH - let me know how this works for you and I'll 
>>>>>>>>>>>>>>>>>>>> incorporate it into future
>>>>>>>>>>>>>>>>>>>> OMPI releases.
>>>>>>>>>>>>>>>>>>>> Ralph
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Apr 24, 2010, at 1:49 AM, Krzysztof Zarzycki wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Hi Ralph,
>>>>>>>>>>>>>>>>>>>> I'm Krzysztof and I'm working with Grzegorz Maj on this 
>>>>>>>>>>>>>>>>>>>> our small
>>>>>>>>>>>>>>>>>>>> project/experiment.
>>>>>>>>>>>>>>>>>>>> We definitely would like to give your patch a try. But 
>>>>>>>>>>>>>>>>>>>> could you please
>>>>>>>>>>>>>>>>>>>> explain your solution a little more?
>>>>>>>>>>>>>>>>>>>> You still would like to start one mpirun per mpi grid, and 
>>>>>>>>>>>>>>>>>>>> then have
>>>>>>>>>>>>>>>>>>>> processes started by us to join the MPI comm?
>>>>>>>>>>>>>>>>>>>> It is a good solution of course.
>>>>>>>>>>>>>>>>>>>> But it would be especially preferable to have one daemon 
>>>>>>>>>>>>>>>>>>>> running
>>>>>>>>>>>>>>>>>>>> persistently on our "entry" machine that can handle 
>>>>>>>>>>>>>>>>>>>> several mpi grid starts.
>>>>>>>>>>>>>>>>>>>> Can your patch help us this way too?
>>>>>>>>>>>>>>>>>>>> Thanks for your help!
>>>>>>>>>>>>>>>>>>>> Krzysztof
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On 24 April 2010 03:51, Ralph Castain <r...@open-mpi.org> 
>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> In thinking about this, my proposed solution won't 
>>>>>>>>>>>>>>>>>>>>> entirely fix the
>>>>>>>>>>>>>>>>>>>>> problem - you'll still wind up with all those daemons. I 
>>>>>>>>>>>>>>>>>>>>> believe I can
>>>>>>>>>>>>>>>>>>>>> resolve that one as well, but it would require a patch.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Would you like me to send you something you could try? 
>>>>>>>>>>>>>>>>>>>>> Might take a couple
>>>>>>>>>>>>>>>>>>>>> of iterations to get it right...
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On Apr 23, 2010, at 12:12 PM, Ralph Castain wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Hmmm....I -think- this will work, but I cannot guarantee 
>>>>>>>>>>>>>>>>>>>>>> it:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> 1. launch one process (can just be a spinner) using 
>>>>>>>>>>>>>>>>>>>>>> mpirun that includes
>>>>>>>>>>>>>>>>>>>>>> the following option:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> mpirun -report-uri file
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> where file is some filename that mpirun can create and 
>>>>>>>>>>>>>>>>>>>>>> insert its
>>>>>>>>>>>>>>>>>>>>>> contact info into it. This can be a relative or absolute 
>>>>>>>>>>>>>>>>>>>>>> path. This process
>>>>>>>>>>>>>>>>>>>>>> must remain alive throughout your application - doesn't 
>>>>>>>>>>>>>>>>>>>>>> matter what it does.
>>>>>>>>>>>>>>>>>>>>>> It's purpose is solely to keep mpirun alive.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> 2. set OMPI_MCA_dpm_orte_server=FILE:file in your 
>>>>>>>>>>>>>>>>>>>>>> environment, where
>>>>>>>>>>>>>>>>>>>>>> "file" is the filename given above. This will tell your 
>>>>>>>>>>>>>>>>>>>>>> processes how to
>>>>>>>>>>>>>>>>>>>>>> find mpirun, which is acting as a meeting place to 
>>>>>>>>>>>>>>>>>>>>>> handle the connect/accept
>>>>>>>>>>>>>>>>>>>>>> operations
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Now run your processes, and have them connect/accept to 
>>>>>>>>>>>>>>>>>>>>>> each other.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> The reason I cannot guarantee this will work is that 
>>>>>>>>>>>>>>>>>>>>>> these processes
>>>>>>>>>>>>>>>>>>>>>> will all have the same rank && name since they all start 
>>>>>>>>>>>>>>>>>>>>>> as singletons.
>>>>>>>>>>>>>>>>>>>>>> Hence, connect/accept is likely to fail.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> But it -might- work, so you might want to give it a try.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> On Apr 23, 2010, at 8:10 AM, Grzegorz Maj wrote:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> To be more precise: by 'server process' I mean some 
>>>>>>>>>>>>>>>>>>>>>>> process that I
>>>>>>>>>>>>>>>>>>>>>>> could run once on my system and it could help in 
>>>>>>>>>>>>>>>>>>>>>>> creating those
>>>>>>>>>>>>>>>>>>>>>>> groups.
>>>>>>>>>>>>>>>>>>>>>>> My typical scenario is:
>>>>>>>>>>>>>>>>>>>>>>> 1. run N separate processes, each without mpirun
>>>>>>>>>>>>>>>>>>>>>>> 2. connect them into MPI group
>>>>>>>>>>>>>>>>>>>>>>> 3. do some job
>>>>>>>>>>>>>>>>>>>>>>> 4. exit all N processes
>>>>>>>>>>>>>>>>>>>>>>> 5. goto 1
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> 2010/4/23 Grzegorz Maj <ma...@wp.pl>:
>>>>>>>>>>>>>>>>>>>>>>>> Thank you Ralph for your explanation.
>>>>>>>>>>>>>>>>>>>>>>>> And, apart from that descriptors' issue, is there any 
>>>>>>>>>>>>>>>>>>>>>>>> other way to
>>>>>>>>>>>>>>>>>>>>>>>> solve my problem, i.e. to run separately a number of 
>>>>>>>>>>>>>>>>>>>>>>>> processes,
>>>>>>>>>>>>>>>>>>>>>>>> without mpirun and then to collect them into an MPI 
>>>>>>>>>>>>>>>>>>>>>>>> intracomm group?
>>>>>>>>>>>>>>>>>>>>>>>> If I for example would need to run some 'server 
>>>>>>>>>>>>>>>>>>>>>>>> process' (even using
>>>>>>>>>>>>>>>>>>>>>>>> mpirun) for this task, that's OK. Any ideas?
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>> Grzegorz Maj
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> 2010/4/18 Ralph Castain <r...@open-mpi.org>:
>>>>>>>>>>>>>>>>>>>>>>>>> Okay, but here is the problem. If you don't use 
>>>>>>>>>>>>>>>>>>>>>>>>> mpirun, and are not
>>>>>>>>>>>>>>>>>>>>>>>>> operating in an environment we support for "direct" 
>>>>>>>>>>>>>>>>>>>>>>>>> launch (i.e., starting
>>>>>>>>>>>>>>>>>>>>>>>>> processes outside of mpirun), then every one of those 
>>>>>>>>>>>>>>>>>>>>>>>>> processes thinks it is
>>>>>>>>>>>>>>>>>>>>>>>>> a singleton - yes?
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> What you may not realize is that each singleton 
>>>>>>>>>>>>>>>>>>>>>>>>> immediately
>>>>>>>>>>>>>>>>>>>>>>>>> fork/exec's an orted daemon that is configured to 
>>>>>>>>>>>>>>>>>>>>>>>>> behave just like mpirun.
>>>>>>>>>>>>>>>>>>>>>>>>> This is required in order to support MPI-2 operations 
>>>>>>>>>>>>>>>>>>>>>>>>> such as
>>>>>>>>>>>>>>>>>>>>>>>>> MPI_Comm_spawn, MPI_Comm_connect/accept, etc.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> So if you launch 64 processes that think they are 
>>>>>>>>>>>>>>>>>>>>>>>>> singletons, then
>>>>>>>>>>>>>>>>>>>>>>>>> you have 64 copies of orted running as well. This 
>>>>>>>>>>>>>>>>>>>>>>>>> eats up a lot of file
>>>>>>>>>>>>>>>>>>>>>>>>> descriptors, which is probably why you are hitting 
>>>>>>>>>>>>>>>>>>>>>>>>> this 65 process limit -
>>>>>>>>>>>>>>>>>>>>>>>>> your system is probably running out of file 
>>>>>>>>>>>>>>>>>>>>>>>>> descriptors. You might check you
>>>>>>>>>>>>>>>>>>>>>>>>> system limits and see if you can get them revised 
>>>>>>>>>>>>>>>>>>>>>>>>> upward.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> On Apr 17, 2010, at 4:24 PM, Grzegorz Maj wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Yes, I know. The problem is that I need to use some 
>>>>>>>>>>>>>>>>>>>>>>>>>> special way for
>>>>>>>>>>>>>>>>>>>>>>>>>> running my processes provided by the environment in 
>>>>>>>>>>>>>>>>>>>>>>>>>> which I'm
>>>>>>>>>>>>>>>>>>>>>>>>>> working
>>>>>>>>>>>>>>>>>>>>>>>>>> and unfortunately I can't use mpirun.
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> 2010/4/18 Ralph Castain <r...@open-mpi.org>:
>>>>>>>>>>>>>>>>>>>>>>>>>>> Guess I don't understand why you can't use mpirun - 
>>>>>>>>>>>>>>>>>>>>>>>>>>> all it does is
>>>>>>>>>>>>>>>>>>>>>>>>>>> start things, provide a means to forward io, etc. 
>>>>>>>>>>>>>>>>>>>>>>>>>>> It mainly sits there
>>>>>>>>>>>>>>>>>>>>>>>>>>> quietly without using any cpu unless required to 
>>>>>>>>>>>>>>>>>>>>>>>>>>> support the job.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> Sounds like it would solve your problem. Otherwise, 
>>>>>>>>>>>>>>>>>>>>>>>>>>> I know of no
>>>>>>>>>>>>>>>>>>>>>>>>>>> way to get all these processes into comm_world.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> On Apr 17, 2010, at 2:27 PM, Grzegorz Maj wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'd like to dynamically create a group of 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> processes communicating
>>>>>>>>>>>>>>>>>>>>>>>>>>>> via
>>>>>>>>>>>>>>>>>>>>>>>>>>>> MPI. Those processes need to be run without mpirun 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> and create
>>>>>>>>>>>>>>>>>>>>>>>>>>>> intracommunicator after the startup. Any ideas how 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> to do this
>>>>>>>>>>>>>>>>>>>>>>>>>>>> efficiently?
>>>>>>>>>>>>>>>>>>>>>>>>>>>> I came up with a solution in which the processes 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> are connecting
>>>>>>>>>>>>>>>>>>>>>>>>>>>> one by
>>>>>>>>>>>>>>>>>>>>>>>>>>>> one using MPI_Comm_connect, but unfortunately all 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> the processes
>>>>>>>>>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>>>>>>>> are already in the group need to call 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> MPI_Comm_accept. This means
>>>>>>>>>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>>>>>>>> when the n-th process wants to connect I need to 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> collect all the
>>>>>>>>>>>>>>>>>>>>>>>>>>>> n-1
>>>>>>>>>>>>>>>>>>>>>>>>>>>> processes on the MPI_Comm_accept call. After I run 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> about 40
>>>>>>>>>>>>>>>>>>>>>>>>>>>> processes
>>>>>>>>>>>>>>>>>>>>>>>>>>>> every subsequent call takes more and more time, 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> which I'd like to
>>>>>>>>>>>>>>>>>>>>>>>>>>>> avoid.
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Another problem in this solution is that when I 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> try to connect
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 66-th
>>>>>>>>>>>>>>>>>>>>>>>>>>>> process the root of the existing group segfaults on
>>>>>>>>>>>>>>>>>>>>>>>>>>>> MPI_Comm_accept.
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Maybe it's my bug, but it's weird as everything 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> works fine for at
>>>>>>>>>>>>>>>>>>>>>>>>>>>> most
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 65 processes. Is there any limitation I don't know 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> about?
>>>>>>>>>>>>>>>>>>>>>>>>>>>> My last question is about MPI_COMM_WORLD. When I 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> run my processes
>>>>>>>>>>>>>>>>>>>>>>>>>>>> without mpirun their MPI_COMM_WORLD is the same as 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> MPI_COMM_SELF.
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Is
>>>>>>>>>>>>>>>>>>>>>>>>>>>> there any way to change MPI_COMM_WORLD and set it 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> to the
>>>>>>>>>>>>>>>>>>>>>>>>>>>> intracommunicator that I've created?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Grzegorz Maj
>>>>>>>>>>>>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>>>>>>>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>>>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>>>>>>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>>>>>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>>>>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> <client.c><server.c>_______________________________________________
>>>>>>>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> users mailing list
>>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> users mailing list
>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> users mailing list
>>>>>>>>> us...@open-mpi.org
>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> users mailing list
>>>>>>>> us...@open-mpi.org
>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> users mailing list
>>>>>>> us...@open-mpi.org
>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> us...@open-mpi.org
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> us...@open-mpi.org
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>>
>>
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Edgar Gabriel
Assistant Professor
Parallel Software Technologies Lab      http://pstl.cs.uh.edu
Department of Computer Science          University of Houston
Philip G. Hoffman Hall, Room 524        Houston, TX-77204, USA
Tel: +1 (713) 743-3857                  Fax: +1 (713) 743-3335

signature.asc
Description: OpenPGP digital signature

Re: [OMPI users] Dynamic processes connection and segfault on MPI_Comm_accept

Reply via email to