My start script looks almost exactly the same as the one published by Edgar, ie. the processes are starting one by one with no delay.
2010/7/20 Ralph Castain <r...@open-mpi.org>: > Grzegorz: something occurred to me. When you start all these processes, how > are you staggering their wireup? Are they flooding us, or are you > time-shifting them a little? > > > On Jul 19, 2010, at 10:32 AM, Edgar Gabriel wrote: > >> Hm, so I am not sure how to approach this. First of all, the test case >> works for me. I used up to 80 clients, and for both optimized and >> non-optimized compilation. I ran the tests with trunk (not with 1.4 >> series, but the communicator code is identical in both cases). Clearly, >> the patch from Ralph is necessary to make it work. >> >> Additionally, I went through the communicator creation code for dynamic >> communicators trying to find spots that could create problems. The only >> place that I found the number 64 appear is the fortran-to-c mapping >> arrays (e.g. for communicators), where the initial size of the table is >> 64. I looked twice over the pointer-array code to see whether we could >> have a problem their (since it is a key-piece of the cid allocation code >> for communicators), but I am fairly confident that it is correct. >> >> Note, that we have other (non-dynamic tests), were comm_set is called >> 100,000 times, and the code per se does not seem to have a problem due >> to being called too often. So I am not sure what else to look at. >> >> Edgar >> >> >> >> On 7/13/2010 8:42 PM, Ralph Castain wrote: >>> As far as I can tell, it appears the problem is somewhere in our >>> communicator setup. The people knowledgeable on that area are going to look >>> into it later this week. >>> >>> I'm creating a ticket to track the problem and will copy you on it. >>> >>> >>> On Jul 13, 2010, at 6:57 AM, Ralph Castain wrote: >>> >>>> >>>> On Jul 13, 2010, at 3:36 AM, Grzegorz Maj wrote: >>>> >>>>> Bad news.. >>>>> I've tried the latest patch with and without the prior one, but it >>>>> hasn't changed anything. I've also tried using the old code but with >>>>> the OMPI_DPM_BASE_MAXJOBIDS constant changed to 80, but it also didn't >>>>> help. >>>>> While looking through the sources of openmpi-1.4.2 I couldn't find any >>>>> call of the function ompi_dpm_base_mark_dyncomm. >>>> >>>> It isn't directly called - it shows in ompi_comm_set as >>>> ompi_dpm.mark_dyncomm. You were definitely overrunning that array, but I >>>> guess something else is also being hit. Have to look further... >>>> >>>> >>>>> >>>>> >>>>> 2010/7/12 Ralph Castain <r...@open-mpi.org>: >>>>>> Just so you don't have to wait for 1.4.3 release, here is the patch >>>>>> (doesn't include the prior patch). >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Jul 12, 2010, at 12:13 PM, Grzegorz Maj wrote: >>>>>> >>>>>>> 2010/7/12 Ralph Castain <r...@open-mpi.org>: >>>>>>>> Dug around a bit and found the problem!! >>>>>>>> >>>>>>>> I have no idea who or why this was done, but somebody set a limit of >>>>>>>> 64 separate jobids in the dynamic init called by ompi_comm_set, which >>>>>>>> builds the intercommunicator. Unfortunately, they hard-wired the array >>>>>>>> size, but never check that size before adding to it. >>>>>>>> >>>>>>>> So after 64 calls to connect_accept, you are overwriting other areas >>>>>>>> of the code. As you found, hitting 66 causes it to segfault. >>>>>>>> >>>>>>>> I'll fix this on the developer's trunk (I'll also add that original >>>>>>>> patch to it). Rather than my searching this thread in detail, can you >>>>>>>> remind me what version you are using so I can patch it too? >>>>>>> >>>>>>> I'm using 1.4.2 >>>>>>> Thanks a lot and I'm looking forward for the patch. >>>>>>> >>>>>>>> >>>>>>>> Thanks for your patience with this! >>>>>>>> Ralph >>>>>>>> >>>>>>>> >>>>>>>> On Jul 12, 2010, at 7:20 AM, Grzegorz Maj wrote: >>>>>>>> >>>>>>>>> 1024 is not the problem: changing it to 2048 hasn't change anything. >>>>>>>>> Following your advice I've run my process using gdb. Unfortunately I >>>>>>>>> didn't get anything more than: >>>>>>>>> >>>>>>>>> Program received signal SIGSEGV, Segmentation fault. >>>>>>>>> [Switching to Thread 0xf7e4c6c0 (LWP 20246)] >>>>>>>>> 0xf7f39905 in ompi_comm_set () from /home/gmaj/openmpi/lib/libmpi.so.0 >>>>>>>>> >>>>>>>>> (gdb) bt >>>>>>>>> #0 0xf7f39905 in ompi_comm_set () from >>>>>>>>> /home/gmaj/openmpi/lib/libmpi.so.0 >>>>>>>>> #1 0xf7e3ba95 in connect_accept () from >>>>>>>>> /home/gmaj/openmpi/lib/openmpi/mca_dpm_orte.so >>>>>>>>> #2 0xf7f62013 in PMPI_Comm_connect () from >>>>>>>>> /home/gmaj/openmpi/lib/libmpi.so.0 >>>>>>>>> #3 0x080489ed in main (argc=825832753, argv=0x34393638) at >>>>>>>>> client.c:43 >>>>>>>>> >>>>>>>>> What's more: when I've added a breakpoint on ompi_comm_set in 66th >>>>>>>>> process and stepped a couple of instructions, one of the other >>>>>>>>> processes crashed (as usualy on ompi_comm_set) earlier than 66th did. >>>>>>>>> >>>>>>>>> Finally I decided to recompile openmpi using -g flag for gcc. In this >>>>>>>>> case the 66 processes issue has gone! I was running my applications >>>>>>>>> exactly the same way as previously (even without recompilation) and >>>>>>>>> I've run successfully over 130 processes. >>>>>>>>> When switching back to the openmpi compilation without -g it again >>>>>>>>> segfaults. >>>>>>>>> >>>>>>>>> Any ideas? I'm really confused. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> 2010/7/7 Ralph Castain <r...@open-mpi.org>: >>>>>>>>>> I would guess the #files limit of 1024. However, if it behaves the >>>>>>>>>> same way when spread across multiple machines, I would suspect it is >>>>>>>>>> somewhere in your program itself. Given that the segfault is in your >>>>>>>>>> process, can you use gdb to look at the core file and see where and >>>>>>>>>> why it fails? >>>>>>>>>> >>>>>>>>>> On Jul 7, 2010, at 10:17 AM, Grzegorz Maj wrote: >>>>>>>>>> >>>>>>>>>>> 2010/7/7 Ralph Castain <r...@open-mpi.org>: >>>>>>>>>>>> >>>>>>>>>>>> On Jul 6, 2010, at 8:48 AM, Grzegorz Maj wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi Ralph, >>>>>>>>>>>>> sorry for the late response, but I couldn't find free time to play >>>>>>>>>>>>> with this. Finally I've applied the patch you prepared. I've >>>>>>>>>>>>> launched >>>>>>>>>>>>> my processes in the way you've described and I think it's working >>>>>>>>>>>>> as >>>>>>>>>>>>> you expected. None of my processes runs the orted daemon and they >>>>>>>>>>>>> can >>>>>>>>>>>>> perform MPI operations. Unfortunately I'm still hitting the 65 >>>>>>>>>>>>> processes issue :( >>>>>>>>>>>>> Maybe I'm doing something wrong. >>>>>>>>>>>>> I attach my source code. If anybody could have a look on this, I >>>>>>>>>>>>> would >>>>>>>>>>>>> be grateful. >>>>>>>>>>>>> >>>>>>>>>>>>> When I run that code with clients_count <= 65 everything works >>>>>>>>>>>>> fine: >>>>>>>>>>>>> all the processes create a common grid, exchange some information >>>>>>>>>>>>> and >>>>>>>>>>>>> disconnect. >>>>>>>>>>>>> When I set clients_count > 65 the 66th process crashes on >>>>>>>>>>>>> MPI_Comm_connect (segmentation fault). >>>>>>>>>>>> >>>>>>>>>>>> I didn't have time to check the code, but my guess is that you are >>>>>>>>>>>> still hitting some kind of file descriptor or other limit. Check >>>>>>>>>>>> to see what your limits are - usually "ulimit" will tell you. >>>>>>>>>>> >>>>>>>>>>> My limitations are: >>>>>>>>>>> time(seconds) unlimited >>>>>>>>>>> file(blocks) unlimited >>>>>>>>>>> data(kb) unlimited >>>>>>>>>>> stack(kb) 10240 >>>>>>>>>>> coredump(blocks) 0 >>>>>>>>>>> memory(kb) unlimited >>>>>>>>>>> locked memory(kb) 64 >>>>>>>>>>> process 200704 >>>>>>>>>>> nofiles 1024 >>>>>>>>>>> vmemory(kb) unlimited >>>>>>>>>>> locks unlimited >>>>>>>>>>> >>>>>>>>>>> Which one do you think could be responsible for that? >>>>>>>>>>> >>>>>>>>>>> I was trying to run all the 66 processes on one machine or spread >>>>>>>>>>> them >>>>>>>>>>> across several machines and it always crashes the same way on the >>>>>>>>>>> 66th >>>>>>>>>>> process. >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Another thing I would like to know is if it's normal that any of >>>>>>>>>>>>> my >>>>>>>>>>>>> processes when calling MPI_Comm_connect or MPI_Comm_accept when >>>>>>>>>>>>> the >>>>>>>>>>>>> other side is not ready, is eating up a full CPU available. >>>>>>>>>>>> >>>>>>>>>>>> Yes - the waiting process is polling in a tight loop waiting for >>>>>>>>>>>> the connection to be made. >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Any help would be appreciated, >>>>>>>>>>>>> Grzegorz Maj >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> 2010/4/24 Ralph Castain <r...@open-mpi.org>: >>>>>>>>>>>>>> Actually, OMPI is distributed with a daemon that does pretty >>>>>>>>>>>>>> much what you >>>>>>>>>>>>>> want. Checkout "man ompi-server". I originally wrote that code >>>>>>>>>>>>>> to support >>>>>>>>>>>>>> cross-application MPI publish/subscribe operations, but we can >>>>>>>>>>>>>> utilize it >>>>>>>>>>>>>> here too. Have to blame me for not making it more publicly known. >>>>>>>>>>>>>> The attached patch upgrades ompi-server and modifies the >>>>>>>>>>>>>> singleton startup >>>>>>>>>>>>>> to provide your desired support. This solution works in the >>>>>>>>>>>>>> following >>>>>>>>>>>>>> manner: >>>>>>>>>>>>>> 1. launch "ompi-server -report-uri <filename>". This starts a >>>>>>>>>>>>>> persistent >>>>>>>>>>>>>> daemon called "ompi-server" that acts as a rendezvous point for >>>>>>>>>>>>>> independently started applications. The problem with starting >>>>>>>>>>>>>> different >>>>>>>>>>>>>> applications and wanting them to MPI connect/accept lies in the >>>>>>>>>>>>>> need to have >>>>>>>>>>>>>> the applications find each other. If they can't discover contact >>>>>>>>>>>>>> info for >>>>>>>>>>>>>> the other app, then they can't wire up their interconnects. The >>>>>>>>>>>>>> "ompi-server" tool provides that rendezvous point. I don't like >>>>>>>>>>>>>> that >>>>>>>>>>>>>> comm_accept segfaulted - should have just error'd out. >>>>>>>>>>>>>> 2. set OMPI_MCA_orte_server=file:<filename>" in the environment >>>>>>>>>>>>>> where you >>>>>>>>>>>>>> will start your processes. This will allow your singleton >>>>>>>>>>>>>> processes to find >>>>>>>>>>>>>> the ompi-server. I automatically also set the envar to connect >>>>>>>>>>>>>> the MPI >>>>>>>>>>>>>> publish/subscribe system for you. >>>>>>>>>>>>>> 3. run your processes. As they think they are singletons, they >>>>>>>>>>>>>> will detect >>>>>>>>>>>>>> the presence of the above envar and automatically connect >>>>>>>>>>>>>> themselves to the >>>>>>>>>>>>>> "ompi-server" daemon. This provides each process with the >>>>>>>>>>>>>> ability to perform >>>>>>>>>>>>>> any MPI-2 operation. >>>>>>>>>>>>>> I tested this on my machines and it worked, so hopefully it will >>>>>>>>>>>>>> meet your >>>>>>>>>>>>>> needs. You only need to run one "ompi-server" period, so long as >>>>>>>>>>>>>> you locate >>>>>>>>>>>>>> it where all of the processes can find the contact file and can >>>>>>>>>>>>>> open a TCP >>>>>>>>>>>>>> socket to the daemon. There is a way to knit multiple >>>>>>>>>>>>>> ompi-servers into a >>>>>>>>>>>>>> broader network (e.g., to connect processes that cannot directly >>>>>>>>>>>>>> access a >>>>>>>>>>>>>> server due to network segmentation), but it's a tad tricky - let >>>>>>>>>>>>>> me know if >>>>>>>>>>>>>> you require it and I'll try to help. >>>>>>>>>>>>>> If you have trouble wiring them all into a single communicator, >>>>>>>>>>>>>> you might >>>>>>>>>>>>>> ask separately about that and see if one of our MPI experts can >>>>>>>>>>>>>> provide >>>>>>>>>>>>>> advice (I'm just the RTE grunt). >>>>>>>>>>>>>> HTH - let me know how this works for you and I'll incorporate it >>>>>>>>>>>>>> into future >>>>>>>>>>>>>> OMPI releases. >>>>>>>>>>>>>> Ralph >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Apr 24, 2010, at 1:49 AM, Krzysztof Zarzycki wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> Hi Ralph, >>>>>>>>>>>>>> I'm Krzysztof and I'm working with Grzegorz Maj on this our small >>>>>>>>>>>>>> project/experiment. >>>>>>>>>>>>>> We definitely would like to give your patch a try. But could you >>>>>>>>>>>>>> please >>>>>>>>>>>>>> explain your solution a little more? >>>>>>>>>>>>>> You still would like to start one mpirun per mpi grid, and then >>>>>>>>>>>>>> have >>>>>>>>>>>>>> processes started by us to join the MPI comm? >>>>>>>>>>>>>> It is a good solution of course. >>>>>>>>>>>>>> But it would be especially preferable to have one daemon running >>>>>>>>>>>>>> persistently on our "entry" machine that can handle several mpi >>>>>>>>>>>>>> grid starts. >>>>>>>>>>>>>> Can your patch help us this way too? >>>>>>>>>>>>>> Thanks for your help! >>>>>>>>>>>>>> Krzysztof >>>>>>>>>>>>>> >>>>>>>>>>>>>> On 24 April 2010 03:51, Ralph Castain <r...@open-mpi.org> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> In thinking about this, my proposed solution won't entirely fix >>>>>>>>>>>>>>> the >>>>>>>>>>>>>>> problem - you'll still wind up with all those daemons. I >>>>>>>>>>>>>>> believe I can >>>>>>>>>>>>>>> resolve that one as well, but it would require a patch. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Would you like me to send you something you could try? Might >>>>>>>>>>>>>>> take a couple >>>>>>>>>>>>>>> of iterations to get it right... >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Apr 23, 2010, at 12:12 PM, Ralph Castain wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hmmm....I -think- this will work, but I cannot guarantee it: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> 1. launch one process (can just be a spinner) using mpirun >>>>>>>>>>>>>>>> that includes >>>>>>>>>>>>>>>> the following option: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> mpirun -report-uri file >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> where file is some filename that mpirun can create and insert >>>>>>>>>>>>>>>> its >>>>>>>>>>>>>>>> contact info into it. This can be a relative or absolute path. >>>>>>>>>>>>>>>> This process >>>>>>>>>>>>>>>> must remain alive throughout your application - doesn't matter >>>>>>>>>>>>>>>> what it does. >>>>>>>>>>>>>>>> It's purpose is solely to keep mpirun alive. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> 2. set OMPI_MCA_dpm_orte_server=FILE:file in your environment, >>>>>>>>>>>>>>>> where >>>>>>>>>>>>>>>> "file" is the filename given above. This will tell your >>>>>>>>>>>>>>>> processes how to >>>>>>>>>>>>>>>> find mpirun, which is acting as a meeting place to handle the >>>>>>>>>>>>>>>> connect/accept >>>>>>>>>>>>>>>> operations >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Now run your processes, and have them connect/accept to each >>>>>>>>>>>>>>>> other. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> The reason I cannot guarantee this will work is that these >>>>>>>>>>>>>>>> processes >>>>>>>>>>>>>>>> will all have the same rank && name since they all start as >>>>>>>>>>>>>>>> singletons. >>>>>>>>>>>>>>>> Hence, connect/accept is likely to fail. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> But it -might- work, so you might want to give it a try. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Apr 23, 2010, at 8:10 AM, Grzegorz Maj wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> To be more precise: by 'server process' I mean some process >>>>>>>>>>>>>>>>> that I >>>>>>>>>>>>>>>>> could run once on my system and it could help in creating >>>>>>>>>>>>>>>>> those >>>>>>>>>>>>>>>>> groups. >>>>>>>>>>>>>>>>> My typical scenario is: >>>>>>>>>>>>>>>>> 1. run N separate processes, each without mpirun >>>>>>>>>>>>>>>>> 2. connect them into MPI group >>>>>>>>>>>>>>>>> 3. do some job >>>>>>>>>>>>>>>>> 4. exit all N processes >>>>>>>>>>>>>>>>> 5. goto 1 >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> 2010/4/23 Grzegorz Maj <ma...@wp.pl>: >>>>>>>>>>>>>>>>>> Thank you Ralph for your explanation. >>>>>>>>>>>>>>>>>> And, apart from that descriptors' issue, is there any other >>>>>>>>>>>>>>>>>> way to >>>>>>>>>>>>>>>>>> solve my problem, i.e. to run separately a number of >>>>>>>>>>>>>>>>>> processes, >>>>>>>>>>>>>>>>>> without mpirun and then to collect them into an MPI >>>>>>>>>>>>>>>>>> intracomm group? >>>>>>>>>>>>>>>>>> If I for example would need to run some 'server process' >>>>>>>>>>>>>>>>>> (even using >>>>>>>>>>>>>>>>>> mpirun) for this task, that's OK. Any ideas? >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>> Grzegorz Maj >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> 2010/4/18 Ralph Castain <r...@open-mpi.org>: >>>>>>>>>>>>>>>>>>> Okay, but here is the problem. If you don't use mpirun, and >>>>>>>>>>>>>>>>>>> are not >>>>>>>>>>>>>>>>>>> operating in an environment we support for "direct" launch >>>>>>>>>>>>>>>>>>> (i.e., starting >>>>>>>>>>>>>>>>>>> processes outside of mpirun), then every one of those >>>>>>>>>>>>>>>>>>> processes thinks it is >>>>>>>>>>>>>>>>>>> a singleton - yes? >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> What you may not realize is that each singleton immediately >>>>>>>>>>>>>>>>>>> fork/exec's an orted daemon that is configured to behave >>>>>>>>>>>>>>>>>>> just like mpirun. >>>>>>>>>>>>>>>>>>> This is required in order to support MPI-2 operations such >>>>>>>>>>>>>>>>>>> as >>>>>>>>>>>>>>>>>>> MPI_Comm_spawn, MPI_Comm_connect/accept, etc. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> So if you launch 64 processes that think they are >>>>>>>>>>>>>>>>>>> singletons, then >>>>>>>>>>>>>>>>>>> you have 64 copies of orted running as well. This eats up a >>>>>>>>>>>>>>>>>>> lot of file >>>>>>>>>>>>>>>>>>> descriptors, which is probably why you are hitting this 65 >>>>>>>>>>>>>>>>>>> process limit - >>>>>>>>>>>>>>>>>>> your system is probably running out of file descriptors. >>>>>>>>>>>>>>>>>>> You might check you >>>>>>>>>>>>>>>>>>> system limits and see if you can get them revised upward. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Apr 17, 2010, at 4:24 PM, Grzegorz Maj wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Yes, I know. The problem is that I need to use some >>>>>>>>>>>>>>>>>>>> special way for >>>>>>>>>>>>>>>>>>>> running my processes provided by the environment in which >>>>>>>>>>>>>>>>>>>> I'm >>>>>>>>>>>>>>>>>>>> working >>>>>>>>>>>>>>>>>>>> and unfortunately I can't use mpirun. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> 2010/4/18 Ralph Castain <r...@open-mpi.org>: >>>>>>>>>>>>>>>>>>>>> Guess I don't understand why you can't use mpirun - all >>>>>>>>>>>>>>>>>>>>> it does is >>>>>>>>>>>>>>>>>>>>> start things, provide a means to forward io, etc. It >>>>>>>>>>>>>>>>>>>>> mainly sits there >>>>>>>>>>>>>>>>>>>>> quietly without using any cpu unless required to support >>>>>>>>>>>>>>>>>>>>> the job. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Sounds like it would solve your problem. Otherwise, I >>>>>>>>>>>>>>>>>>>>> know of no >>>>>>>>>>>>>>>>>>>>> way to get all these processes into comm_world. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> On Apr 17, 2010, at 2:27 PM, Grzegorz Maj wrote: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>>>>>>> I'd like to dynamically create a group of processes >>>>>>>>>>>>>>>>>>>>>> communicating >>>>>>>>>>>>>>>>>>>>>> via >>>>>>>>>>>>>>>>>>>>>> MPI. Those processes need to be run without mpirun and >>>>>>>>>>>>>>>>>>>>>> create >>>>>>>>>>>>>>>>>>>>>> intracommunicator after the startup. Any ideas how to do >>>>>>>>>>>>>>>>>>>>>> this >>>>>>>>>>>>>>>>>>>>>> efficiently? >>>>>>>>>>>>>>>>>>>>>> I came up with a solution in which the processes are >>>>>>>>>>>>>>>>>>>>>> connecting >>>>>>>>>>>>>>>>>>>>>> one by >>>>>>>>>>>>>>>>>>>>>> one using MPI_Comm_connect, but unfortunately all the >>>>>>>>>>>>>>>>>>>>>> processes >>>>>>>>>>>>>>>>>>>>>> that >>>>>>>>>>>>>>>>>>>>>> are already in the group need to call MPI_Comm_accept. >>>>>>>>>>>>>>>>>>>>>> This means >>>>>>>>>>>>>>>>>>>>>> that >>>>>>>>>>>>>>>>>>>>>> when the n-th process wants to connect I need to collect >>>>>>>>>>>>>>>>>>>>>> all the >>>>>>>>>>>>>>>>>>>>>> n-1 >>>>>>>>>>>>>>>>>>>>>> processes on the MPI_Comm_accept call. After I run about >>>>>>>>>>>>>>>>>>>>>> 40 >>>>>>>>>>>>>>>>>>>>>> processes >>>>>>>>>>>>>>>>>>>>>> every subsequent call takes more and more time, which >>>>>>>>>>>>>>>>>>>>>> I'd like to >>>>>>>>>>>>>>>>>>>>>> avoid. >>>>>>>>>>>>>>>>>>>>>> Another problem in this solution is that when I try to >>>>>>>>>>>>>>>>>>>>>> connect >>>>>>>>>>>>>>>>>>>>>> 66-th >>>>>>>>>>>>>>>>>>>>>> process the root of the existing group segfaults on >>>>>>>>>>>>>>>>>>>>>> MPI_Comm_accept. >>>>>>>>>>>>>>>>>>>>>> Maybe it's my bug, but it's weird as everything works >>>>>>>>>>>>>>>>>>>>>> fine for at >>>>>>>>>>>>>>>>>>>>>> most >>>>>>>>>>>>>>>>>>>>>> 65 processes. Is there any limitation I don't know about? >>>>>>>>>>>>>>>>>>>>>> My last question is about MPI_COMM_WORLD. When I run my >>>>>>>>>>>>>>>>>>>>>> processes >>>>>>>>>>>>>>>>>>>>>> without mpirun their MPI_COMM_WORLD is the same as >>>>>>>>>>>>>>>>>>>>>> MPI_COMM_SELF. >>>>>>>>>>>>>>>>>>>>>> Is >>>>>>>>>>>>>>>>>>>>>> there any way to change MPI_COMM_WORLD and set it to the >>>>>>>>>>>>>>>>>>>>>> intracommunicator that I've created? >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>> Grzegorz Maj >>>>>>>>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>>>>>>>> users mailing list >>>>>>>>>>>>>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>>>>>>> users mailing list >>>>>>>>>>>>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>>>>>> users mailing list >>>>>>>>>>>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>>>>> users mailing list >>>>>>>>>>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>>> users mailing list >>>>>>>>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>> users mailing list >>>>>>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>>>>> >>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>> users mailing list >>>>>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>>>>> >>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>> users mailing list >>>>>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>>>>> >>>>>>>>>>>>> <client.c><server.c>_______________________________________________ >>>>>>>>>>>>> users mailing list >>>>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> users mailing list >>>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> users mailing list >>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> users mailing list >>>>>>>>>> us...@open-mpi.org >>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> users mailing list >>>>>>>>> us...@open-mpi.org >>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> users mailing list >>>>>>>> us...@open-mpi.org >>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> users mailing list >>>>>>> us...@open-mpi.org >>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> users mailing list >>>>>> us...@open-mpi.org >>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>> >>>>> >>>>> _______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> >>> >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > >