This slides outside of my purview - I would suggest you post this question with a different subject line specifically mentioning failure of intercomm_merge to work so it attracts the attention of those with knowledge of that area.
On Jul 27, 2010, at 9:30 AM, Grzegorz Maj wrote: > So now I have a new question. > When I run my server and a lot of clients on the same machine, > everything looks fine. > > But when I try to run the clients on several machines the most > frequent scenario is: > * server is stared on machine A > * X (= 1, 4, 10, ..) clients are started on machine B and they connect > successfully > * the first client starting on machine C connects successfully to the > server, but the whole grid hangs on MPI_Comm_merge (all the processes > from intercommunicator get there). > > As I said it's the most frequent scenario. Sometimes I can connect the > clients from several machines. Sometimes it hangs (always on > MPI_Comm_merge) when connecting the clients from machine B. > The interesting thing is, that if before MPI_Comm_merge I send a dummy > message on the intercommunicator from process rank 0 in one group to > process rank 0 in the other one, it will not hang on MPI_Comm_merge. > > I've tried both versions with and without the first patch (ompi-server > as orted) but it doesn't change the behavior. > > I've attached gdb to my server, this is bt: > #0 0xffffe410 in __kernel_vsyscall () > #1 0x00637afc in sched_yield () from /lib/libc.so.6 > #2 0xf7e8ce31 in opal_progress () at ../../opal/runtime/opal_progress.c:220 > #3 0xf7f60ad4 in opal_condition_wait (c=0xf7fd7dc0, m=0xf7fd7e00) at > ../../opal/threads/condition.h:99 > #4 0xf7f60dee in ompi_request_default_wait_all (count=2, > requests=0xff8d7754, statuses=0x0) at > ../../ompi/request/req_wait.c:262 > #5 0xf7d3e221 in mca_coll_inter_allgatherv_inter (sbuf=0xff8d7824, > scount=1, sdtype=0x8049200, rbuf=0xff8d77e0, rcounts=0x9783df8, > disps=0x9755520, rdtype=0x8049200, comm=0x978c2a8, module=0x9794b08) > at ../../../../../ompi/mca/coll/inter/coll_inter_allgatherv.c:127 > #6 0xf7f4c615 in ompi_comm_determine_first (intercomm=0x978c2a8, > high=0) at ../../ompi/communicator/comm.c:1199 > #7 0xf7f8d1d9 in PMPI_Intercomm_merge (intercomm=0x978c2a8, high=0, > newcomm=0xff8d78c0) at pintercomm_merge.c:84 > #8 0x0804893c in main (argc=Cannot access memory at address 0xf > ) at server.c:50 > > And this is bt from one of the clients: > #0 0xffffe410 in __kernel_vsyscall () > #1 0x0064993b in poll () from /lib/libc.so.6 > #2 0xf7de027f in poll_dispatch (base=0x8643fb8, arg=0x86442d8, > tv=0xff82299c) at ../../../opal/event/poll.c:168 > #3 0xf7dde4b2 in opal_event_base_loop (base=0x8643fb8, flags=2) at > ../../../opal/event/event.c:807 > #4 0xf7dde34f in opal_event_loop (flags=2) at ../../../opal/event/event.c:730 > #5 0xf7dcfc77 in opal_progress () at ../../opal/runtime/opal_progress.c:189 > #6 0xf7ea80b8 in opal_condition_wait (c=0xf7f25160, m=0xf7f251a0) at > ../../opal/threads/condition.h:99 > #7 0xf7ea7ff3 in ompi_request_wait_completion (req=0x8686680) at > ../../ompi/request/request.h:375 > #8 0xf7ea7ef1 in ompi_request_default_wait (req_ptr=0xff822ae8, > status=0x0) at ../../ompi/request/req_wait.c:37 > #9 0xf7c663a6 in ompi_coll_tuned_bcast_intra_generic > (buffer=0xff822d20, original_count=1, datatype=0x868bd00, root=0, > comm=0x86aa7f8, module=0x868b700, count_by_segment=1, tree=0x868b3d8) > at ../../../../../ompi/mca/coll/tuned/coll_tuned_bcast.c:237 > #10 0xf7c668ea in ompi_coll_tuned_bcast_intra_binomial > (buffer=0xff822d20, count=1, datatype=0x868bd00, root=0, > comm=0x86aa7f8, module=0x868b700, segsize=0) > at ../../../../../ompi/mca/coll/tuned/coll_tuned_bcast.c:368 > #11 0xf7c5af12 in ompi_coll_tuned_bcast_intra_dec_fixed > (buff=0xff822d20, count=1, datatype=0x868bd00, root=0, comm=0x86aa7f8, > module=0x868b700) > at ../../../../../ompi/mca/coll/tuned/coll_tuned_decision_fixed.c:256 > #12 0xf7c73269 in mca_coll_sync_bcast (buff=0xff822d20, count=1, > datatype=0x868bd00, root=0, comm=0x86aa7f8, module=0x86aaa28) at > ../../../../../ompi/mca/coll/sync/coll_sync_bcast.c:44 > #13 0xf7c80381 in mca_coll_inter_allgatherv_inter (sbuf=0xff822d64, > scount=0, sdtype=0x8049400, rbuf=0xff822d20, rcounts=0x868a188, > disps=0x868abb8, rdtype=0x8049400, comm=0x86aa300, > module=0x86aae18) at > ../../../../../ompi/mca/coll/inter/coll_inter_allgatherv.c:134 > #14 0xf7e9398f in ompi_comm_determine_first (intercomm=0x86aa300, > high=0) at ../../ompi/communicator/comm.c:1199 > #15 0xf7ed7833 in PMPI_Intercomm_merge (intercomm=0x86aa300, high=0, > newcomm=0xff8241d0) at pintercomm_merge.c:84 > #16 0x08048afd in main (argc=943274038, argv=0x33393133) at client.c:47 > > > > What do you think may cause the problem? > > > 2010/7/26 Ralph Castain <r...@open-mpi.org>: >> No problem at all - glad it works! >> >> On Jul 26, 2010, at 7:58 AM, Grzegorz Maj wrote: >> >>> Hi, >>> I'm very sorry, but the problem was on my side. My installation >>> process was not always taking the newest sources of openmpi. In this >>> case it hasn't installed the version with the latest patch. Now I >>> think everything works fine - I could run over 130 processes with no >>> problems. >>> I'm sorry again that I've wasted your time. And thank you for the patch. >>> >>> 2010/7/21 Ralph Castain <r...@open-mpi.org>: >>>> We're having some problem replicating this once my patches are applied. >>>> Can you send us your configure cmd? Just the output from "head config.log" >>>> will do for now. >>>> >>>> Thanks! >>>> >>>> On Jul 20, 2010, at 9:09 AM, Grzegorz Maj wrote: >>>> >>>>> My start script looks almost exactly the same as the one published by >>>>> Edgar, ie. the processes are starting one by one with no delay. >>>>> >>>>> 2010/7/20 Ralph Castain <r...@open-mpi.org>: >>>>>> Grzegorz: something occurred to me. When you start all these processes, >>>>>> how are you staggering their wireup? Are they flooding us, or are you >>>>>> time-shifting them a little? >>>>>> >>>>>> >>>>>> On Jul 19, 2010, at 10:32 AM, Edgar Gabriel wrote: >>>>>> >>>>>>> Hm, so I am not sure how to approach this. First of all, the test case >>>>>>> works for me. I used up to 80 clients, and for both optimized and >>>>>>> non-optimized compilation. I ran the tests with trunk (not with 1.4 >>>>>>> series, but the communicator code is identical in both cases). Clearly, >>>>>>> the patch from Ralph is necessary to make it work. >>>>>>> >>>>>>> Additionally, I went through the communicator creation code for dynamic >>>>>>> communicators trying to find spots that could create problems. The only >>>>>>> place that I found the number 64 appear is the fortran-to-c mapping >>>>>>> arrays (e.g. for communicators), where the initial size of the table is >>>>>>> 64. I looked twice over the pointer-array code to see whether we could >>>>>>> have a problem their (since it is a key-piece of the cid allocation code >>>>>>> for communicators), but I am fairly confident that it is correct. >>>>>>> >>>>>>> Note, that we have other (non-dynamic tests), were comm_set is called >>>>>>> 100,000 times, and the code per se does not seem to have a problem due >>>>>>> to being called too often. So I am not sure what else to look at. >>>>>>> >>>>>>> Edgar >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 7/13/2010 8:42 PM, Ralph Castain wrote: >>>>>>>> As far as I can tell, it appears the problem is somewhere in our >>>>>>>> communicator setup. The people knowledgeable on that area are going to >>>>>>>> look into it later this week. >>>>>>>> >>>>>>>> I'm creating a ticket to track the problem and will copy you on it. >>>>>>>> >>>>>>>> >>>>>>>> On Jul 13, 2010, at 6:57 AM, Ralph Castain wrote: >>>>>>>> >>>>>>>>> >>>>>>>>> On Jul 13, 2010, at 3:36 AM, Grzegorz Maj wrote: >>>>>>>>> >>>>>>>>>> Bad news.. >>>>>>>>>> I've tried the latest patch with and without the prior one, but it >>>>>>>>>> hasn't changed anything. I've also tried using the old code but with >>>>>>>>>> the OMPI_DPM_BASE_MAXJOBIDS constant changed to 80, but it also >>>>>>>>>> didn't >>>>>>>>>> help. >>>>>>>>>> While looking through the sources of openmpi-1.4.2 I couldn't find >>>>>>>>>> any >>>>>>>>>> call of the function ompi_dpm_base_mark_dyncomm. >>>>>>>>> >>>>>>>>> It isn't directly called - it shows in ompi_comm_set as >>>>>>>>> ompi_dpm.mark_dyncomm. You were definitely overrunning that array, >>>>>>>>> but I guess something else is also being hit. Have to look further... >>>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> 2010/7/12 Ralph Castain <r...@open-mpi.org>: >>>>>>>>>>> Just so you don't have to wait for 1.4.3 release, here is the patch >>>>>>>>>>> (doesn't include the prior patch). >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Jul 12, 2010, at 12:13 PM, Grzegorz Maj wrote: >>>>>>>>>>> >>>>>>>>>>>> 2010/7/12 Ralph Castain <r...@open-mpi.org>: >>>>>>>>>>>>> Dug around a bit and found the problem!! >>>>>>>>>>>>> >>>>>>>>>>>>> I have no idea who or why this was done, but somebody set a limit >>>>>>>>>>>>> of 64 separate jobids in the dynamic init called by >>>>>>>>>>>>> ompi_comm_set, which builds the intercommunicator. Unfortunately, >>>>>>>>>>>>> they hard-wired the array size, but never check that size before >>>>>>>>>>>>> adding to it. >>>>>>>>>>>>> >>>>>>>>>>>>> So after 64 calls to connect_accept, you are overwriting other >>>>>>>>>>>>> areas of the code. As you found, hitting 66 causes it to segfault. >>>>>>>>>>>>> >>>>>>>>>>>>> I'll fix this on the developer's trunk (I'll also add that >>>>>>>>>>>>> original patch to it). Rather than my searching this thread in >>>>>>>>>>>>> detail, can you remind me what version you are using so I can >>>>>>>>>>>>> patch it too? >>>>>>>>>>>> >>>>>>>>>>>> I'm using 1.4.2 >>>>>>>>>>>> Thanks a lot and I'm looking forward for the patch. >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks for your patience with this! >>>>>>>>>>>>> Ralph >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Jul 12, 2010, at 7:20 AM, Grzegorz Maj wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> 1024 is not the problem: changing it to 2048 hasn't change >>>>>>>>>>>>>> anything. >>>>>>>>>>>>>> Following your advice I've run my process using gdb. >>>>>>>>>>>>>> Unfortunately I >>>>>>>>>>>>>> didn't get anything more than: >>>>>>>>>>>>>> >>>>>>>>>>>>>> Program received signal SIGSEGV, Segmentation fault. >>>>>>>>>>>>>> [Switching to Thread 0xf7e4c6c0 (LWP 20246)] >>>>>>>>>>>>>> 0xf7f39905 in ompi_comm_set () from >>>>>>>>>>>>>> /home/gmaj/openmpi/lib/libmpi.so.0 >>>>>>>>>>>>>> >>>>>>>>>>>>>> (gdb) bt >>>>>>>>>>>>>> #0 0xf7f39905 in ompi_comm_set () from >>>>>>>>>>>>>> /home/gmaj/openmpi/lib/libmpi.so.0 >>>>>>>>>>>>>> #1 0xf7e3ba95 in connect_accept () from >>>>>>>>>>>>>> /home/gmaj/openmpi/lib/openmpi/mca_dpm_orte.so >>>>>>>>>>>>>> #2 0xf7f62013 in PMPI_Comm_connect () from >>>>>>>>>>>>>> /home/gmaj/openmpi/lib/libmpi.so.0 >>>>>>>>>>>>>> #3 0x080489ed in main (argc=825832753, argv=0x34393638) at >>>>>>>>>>>>>> client.c:43 >>>>>>>>>>>>>> >>>>>>>>>>>>>> What's more: when I've added a breakpoint on ompi_comm_set in >>>>>>>>>>>>>> 66th >>>>>>>>>>>>>> process and stepped a couple of instructions, one of the other >>>>>>>>>>>>>> processes crashed (as usualy on ompi_comm_set) earlier than 66th >>>>>>>>>>>>>> did. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Finally I decided to recompile openmpi using -g flag for gcc. In >>>>>>>>>>>>>> this >>>>>>>>>>>>>> case the 66 processes issue has gone! I was running my >>>>>>>>>>>>>> applications >>>>>>>>>>>>>> exactly the same way as previously (even without recompilation) >>>>>>>>>>>>>> and >>>>>>>>>>>>>> I've run successfully over 130 processes. >>>>>>>>>>>>>> When switching back to the openmpi compilation without -g it >>>>>>>>>>>>>> again segfaults. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Any ideas? I'm really confused. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> 2010/7/7 Ralph Castain <r...@open-mpi.org>: >>>>>>>>>>>>>>> I would guess the #files limit of 1024. However, if it behaves >>>>>>>>>>>>>>> the same way when spread across multiple machines, I would >>>>>>>>>>>>>>> suspect it is somewhere in your program itself. Given that the >>>>>>>>>>>>>>> segfault is in your process, can you use gdb to look at the >>>>>>>>>>>>>>> core file and see where and why it fails? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Jul 7, 2010, at 10:17 AM, Grzegorz Maj wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> 2010/7/7 Ralph Castain <r...@open-mpi.org>: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Jul 6, 2010, at 8:48 AM, Grzegorz Maj wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Hi Ralph, >>>>>>>>>>>>>>>>>> sorry for the late response, but I couldn't find free time >>>>>>>>>>>>>>>>>> to play >>>>>>>>>>>>>>>>>> with this. Finally I've applied the patch you prepared. I've >>>>>>>>>>>>>>>>>> launched >>>>>>>>>>>>>>>>>> my processes in the way you've described and I think it's >>>>>>>>>>>>>>>>>> working as >>>>>>>>>>>>>>>>>> you expected. None of my processes runs the orted daemon and >>>>>>>>>>>>>>>>>> they can >>>>>>>>>>>>>>>>>> perform MPI operations. Unfortunately I'm still hitting the >>>>>>>>>>>>>>>>>> 65 >>>>>>>>>>>>>>>>>> processes issue :( >>>>>>>>>>>>>>>>>> Maybe I'm doing something wrong. >>>>>>>>>>>>>>>>>> I attach my source code. If anybody could have a look on >>>>>>>>>>>>>>>>>> this, I would >>>>>>>>>>>>>>>>>> be grateful. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> When I run that code with clients_count <= 65 everything >>>>>>>>>>>>>>>>>> works fine: >>>>>>>>>>>>>>>>>> all the processes create a common grid, exchange some >>>>>>>>>>>>>>>>>> information and >>>>>>>>>>>>>>>>>> disconnect. >>>>>>>>>>>>>>>>>> When I set clients_count > 65 the 66th process crashes on >>>>>>>>>>>>>>>>>> MPI_Comm_connect (segmentation fault). >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I didn't have time to check the code, but my guess is that >>>>>>>>>>>>>>>>> you are still hitting some kind of file descriptor or other >>>>>>>>>>>>>>>>> limit. Check to see what your limits are - usually "ulimit" >>>>>>>>>>>>>>>>> will tell you. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> My limitations are: >>>>>>>>>>>>>>>> time(seconds) unlimited >>>>>>>>>>>>>>>> file(blocks) unlimited >>>>>>>>>>>>>>>> data(kb) unlimited >>>>>>>>>>>>>>>> stack(kb) 10240 >>>>>>>>>>>>>>>> coredump(blocks) 0 >>>>>>>>>>>>>>>> memory(kb) unlimited >>>>>>>>>>>>>>>> locked memory(kb) 64 >>>>>>>>>>>>>>>> process 200704 >>>>>>>>>>>>>>>> nofiles 1024 >>>>>>>>>>>>>>>> vmemory(kb) unlimited >>>>>>>>>>>>>>>> locks unlimited >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Which one do you think could be responsible for that? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I was trying to run all the 66 processes on one machine or >>>>>>>>>>>>>>>> spread them >>>>>>>>>>>>>>>> across several machines and it always crashes the same way on >>>>>>>>>>>>>>>> the 66th >>>>>>>>>>>>>>>> process. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Another thing I would like to know is if it's normal that >>>>>>>>>>>>>>>>>> any of my >>>>>>>>>>>>>>>>>> processes when calling MPI_Comm_connect or MPI_Comm_accept >>>>>>>>>>>>>>>>>> when the >>>>>>>>>>>>>>>>>> other side is not ready, is eating up a full CPU available. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Yes - the waiting process is polling in a tight loop waiting >>>>>>>>>>>>>>>>> for the connection to be made. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Any help would be appreciated, >>>>>>>>>>>>>>>>>> Grzegorz Maj >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> 2010/4/24 Ralph Castain <r...@open-mpi.org>: >>>>>>>>>>>>>>>>>>> Actually, OMPI is distributed with a daemon that does >>>>>>>>>>>>>>>>>>> pretty much what you >>>>>>>>>>>>>>>>>>> want. Checkout "man ompi-server". I originally wrote that >>>>>>>>>>>>>>>>>>> code to support >>>>>>>>>>>>>>>>>>> cross-application MPI publish/subscribe operations, but we >>>>>>>>>>>>>>>>>>> can utilize it >>>>>>>>>>>>>>>>>>> here too. Have to blame me for not making it more publicly >>>>>>>>>>>>>>>>>>> known. >>>>>>>>>>>>>>>>>>> The attached patch upgrades ompi-server and modifies the >>>>>>>>>>>>>>>>>>> singleton startup >>>>>>>>>>>>>>>>>>> to provide your desired support. This solution works in the >>>>>>>>>>>>>>>>>>> following >>>>>>>>>>>>>>>>>>> manner: >>>>>>>>>>>>>>>>>>> 1. launch "ompi-server -report-uri <filename>". This starts >>>>>>>>>>>>>>>>>>> a persistent >>>>>>>>>>>>>>>>>>> daemon called "ompi-server" that acts as a rendezvous point >>>>>>>>>>>>>>>>>>> for >>>>>>>>>>>>>>>>>>> independently started applications. The problem with >>>>>>>>>>>>>>>>>>> starting different >>>>>>>>>>>>>>>>>>> applications and wanting them to MPI connect/accept lies in >>>>>>>>>>>>>>>>>>> the need to have >>>>>>>>>>>>>>>>>>> the applications find each other. If they can't discover >>>>>>>>>>>>>>>>>>> contact info for >>>>>>>>>>>>>>>>>>> the other app, then they can't wire up their interconnects. >>>>>>>>>>>>>>>>>>> The >>>>>>>>>>>>>>>>>>> "ompi-server" tool provides that rendezvous point. I don't >>>>>>>>>>>>>>>>>>> like that >>>>>>>>>>>>>>>>>>> comm_accept segfaulted - should have just error'd out. >>>>>>>>>>>>>>>>>>> 2. set OMPI_MCA_orte_server=file:<filename>" in the >>>>>>>>>>>>>>>>>>> environment where you >>>>>>>>>>>>>>>>>>> will start your processes. This will allow your singleton >>>>>>>>>>>>>>>>>>> processes to find >>>>>>>>>>>>>>>>>>> the ompi-server. I automatically also set the envar to >>>>>>>>>>>>>>>>>>> connect the MPI >>>>>>>>>>>>>>>>>>> publish/subscribe system for you. >>>>>>>>>>>>>>>>>>> 3. run your processes. As they think they are singletons, >>>>>>>>>>>>>>>>>>> they will detect >>>>>>>>>>>>>>>>>>> the presence of the above envar and automatically connect >>>>>>>>>>>>>>>>>>> themselves to the >>>>>>>>>>>>>>>>>>> "ompi-server" daemon. This provides each process with the >>>>>>>>>>>>>>>>>>> ability to perform >>>>>>>>>>>>>>>>>>> any MPI-2 operation. >>>>>>>>>>>>>>>>>>> I tested this on my machines and it worked, so hopefully it >>>>>>>>>>>>>>>>>>> will meet your >>>>>>>>>>>>>>>>>>> needs. You only need to run one "ompi-server" period, so >>>>>>>>>>>>>>>>>>> long as you locate >>>>>>>>>>>>>>>>>>> it where all of the processes can find the contact file and >>>>>>>>>>>>>>>>>>> can open a TCP >>>>>>>>>>>>>>>>>>> socket to the daemon. There is a way to knit multiple >>>>>>>>>>>>>>>>>>> ompi-servers into a >>>>>>>>>>>>>>>>>>> broader network (e.g., to connect processes that cannot >>>>>>>>>>>>>>>>>>> directly access a >>>>>>>>>>>>>>>>>>> server due to network segmentation), but it's a tad tricky >>>>>>>>>>>>>>>>>>> - let me know if >>>>>>>>>>>>>>>>>>> you require it and I'll try to help. >>>>>>>>>>>>>>>>>>> If you have trouble wiring them all into a single >>>>>>>>>>>>>>>>>>> communicator, you might >>>>>>>>>>>>>>>>>>> ask separately about that and see if one of our MPI experts >>>>>>>>>>>>>>>>>>> can provide >>>>>>>>>>>>>>>>>>> advice (I'm just the RTE grunt). >>>>>>>>>>>>>>>>>>> HTH - let me know how this works for you and I'll >>>>>>>>>>>>>>>>>>> incorporate it into future >>>>>>>>>>>>>>>>>>> OMPI releases. >>>>>>>>>>>>>>>>>>> Ralph >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Apr 24, 2010, at 1:49 AM, Krzysztof Zarzycki wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Hi Ralph, >>>>>>>>>>>>>>>>>>> I'm Krzysztof and I'm working with Grzegorz Maj on this our >>>>>>>>>>>>>>>>>>> small >>>>>>>>>>>>>>>>>>> project/experiment. >>>>>>>>>>>>>>>>>>> We definitely would like to give your patch a try. But >>>>>>>>>>>>>>>>>>> could you please >>>>>>>>>>>>>>>>>>> explain your solution a little more? >>>>>>>>>>>>>>>>>>> You still would like to start one mpirun per mpi grid, and >>>>>>>>>>>>>>>>>>> then have >>>>>>>>>>>>>>>>>>> processes started by us to join the MPI comm? >>>>>>>>>>>>>>>>>>> It is a good solution of course. >>>>>>>>>>>>>>>>>>> But it would be especially preferable to have one daemon >>>>>>>>>>>>>>>>>>> running >>>>>>>>>>>>>>>>>>> persistently on our "entry" machine that can handle several >>>>>>>>>>>>>>>>>>> mpi grid starts. >>>>>>>>>>>>>>>>>>> Can your patch help us this way too? >>>>>>>>>>>>>>>>>>> Thanks for your help! >>>>>>>>>>>>>>>>>>> Krzysztof >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On 24 April 2010 03:51, Ralph Castain <r...@open-mpi.org> >>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> In thinking about this, my proposed solution won't >>>>>>>>>>>>>>>>>>>> entirely fix the >>>>>>>>>>>>>>>>>>>> problem - you'll still wind up with all those daemons. I >>>>>>>>>>>>>>>>>>>> believe I can >>>>>>>>>>>>>>>>>>>> resolve that one as well, but it would require a patch. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Would you like me to send you something you could try? >>>>>>>>>>>>>>>>>>>> Might take a couple >>>>>>>>>>>>>>>>>>>> of iterations to get it right... >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On Apr 23, 2010, at 12:12 PM, Ralph Castain wrote: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Hmmm....I -think- this will work, but I cannot guarantee >>>>>>>>>>>>>>>>>>>>> it: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> 1. launch one process (can just be a spinner) using >>>>>>>>>>>>>>>>>>>>> mpirun that includes >>>>>>>>>>>>>>>>>>>>> the following option: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> mpirun -report-uri file >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> where file is some filename that mpirun can create and >>>>>>>>>>>>>>>>>>>>> insert its >>>>>>>>>>>>>>>>>>>>> contact info into it. This can be a relative or absolute >>>>>>>>>>>>>>>>>>>>> path. This process >>>>>>>>>>>>>>>>>>>>> must remain alive throughout your application - doesn't >>>>>>>>>>>>>>>>>>>>> matter what it does. >>>>>>>>>>>>>>>>>>>>> It's purpose is solely to keep mpirun alive. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> 2. set OMPI_MCA_dpm_orte_server=FILE:file in your >>>>>>>>>>>>>>>>>>>>> environment, where >>>>>>>>>>>>>>>>>>>>> "file" is the filename given above. This will tell your >>>>>>>>>>>>>>>>>>>>> processes how to >>>>>>>>>>>>>>>>>>>>> find mpirun, which is acting as a meeting place to handle >>>>>>>>>>>>>>>>>>>>> the connect/accept >>>>>>>>>>>>>>>>>>>>> operations >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Now run your processes, and have them connect/accept to >>>>>>>>>>>>>>>>>>>>> each other. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> The reason I cannot guarantee this will work is that >>>>>>>>>>>>>>>>>>>>> these processes >>>>>>>>>>>>>>>>>>>>> will all have the same rank && name since they all start >>>>>>>>>>>>>>>>>>>>> as singletons. >>>>>>>>>>>>>>>>>>>>> Hence, connect/accept is likely to fail. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> But it -might- work, so you might want to give it a try. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> On Apr 23, 2010, at 8:10 AM, Grzegorz Maj wrote: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> To be more precise: by 'server process' I mean some >>>>>>>>>>>>>>>>>>>>>> process that I >>>>>>>>>>>>>>>>>>>>>> could run once on my system and it could help in >>>>>>>>>>>>>>>>>>>>>> creating those >>>>>>>>>>>>>>>>>>>>>> groups. >>>>>>>>>>>>>>>>>>>>>> My typical scenario is: >>>>>>>>>>>>>>>>>>>>>> 1. run N separate processes, each without mpirun >>>>>>>>>>>>>>>>>>>>>> 2. connect them into MPI group >>>>>>>>>>>>>>>>>>>>>> 3. do some job >>>>>>>>>>>>>>>>>>>>>> 4. exit all N processes >>>>>>>>>>>>>>>>>>>>>> 5. goto 1 >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> 2010/4/23 Grzegorz Maj <ma...@wp.pl>: >>>>>>>>>>>>>>>>>>>>>>> Thank you Ralph for your explanation. >>>>>>>>>>>>>>>>>>>>>>> And, apart from that descriptors' issue, is there any >>>>>>>>>>>>>>>>>>>>>>> other way to >>>>>>>>>>>>>>>>>>>>>>> solve my problem, i.e. to run separately a number of >>>>>>>>>>>>>>>>>>>>>>> processes, >>>>>>>>>>>>>>>>>>>>>>> without mpirun and then to collect them into an MPI >>>>>>>>>>>>>>>>>>>>>>> intracomm group? >>>>>>>>>>>>>>>>>>>>>>> If I for example would need to run some 'server >>>>>>>>>>>>>>>>>>>>>>> process' (even using >>>>>>>>>>>>>>>>>>>>>>> mpirun) for this task, that's OK. Any ideas? >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>> Grzegorz Maj >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> 2010/4/18 Ralph Castain <r...@open-mpi.org>: >>>>>>>>>>>>>>>>>>>>>>>> Okay, but here is the problem. If you don't use >>>>>>>>>>>>>>>>>>>>>>>> mpirun, and are not >>>>>>>>>>>>>>>>>>>>>>>> operating in an environment we support for "direct" >>>>>>>>>>>>>>>>>>>>>>>> launch (i.e., starting >>>>>>>>>>>>>>>>>>>>>>>> processes outside of mpirun), then every one of those >>>>>>>>>>>>>>>>>>>>>>>> processes thinks it is >>>>>>>>>>>>>>>>>>>>>>>> a singleton - yes? >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> What you may not realize is that each singleton >>>>>>>>>>>>>>>>>>>>>>>> immediately >>>>>>>>>>>>>>>>>>>>>>>> fork/exec's an orted daemon that is configured to >>>>>>>>>>>>>>>>>>>>>>>> behave just like mpirun. >>>>>>>>>>>>>>>>>>>>>>>> This is required in order to support MPI-2 operations >>>>>>>>>>>>>>>>>>>>>>>> such as >>>>>>>>>>>>>>>>>>>>>>>> MPI_Comm_spawn, MPI_Comm_connect/accept, etc. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> So if you launch 64 processes that think they are >>>>>>>>>>>>>>>>>>>>>>>> singletons, then >>>>>>>>>>>>>>>>>>>>>>>> you have 64 copies of orted running as well. This eats >>>>>>>>>>>>>>>>>>>>>>>> up a lot of file >>>>>>>>>>>>>>>>>>>>>>>> descriptors, which is probably why you are hitting >>>>>>>>>>>>>>>>>>>>>>>> this 65 process limit - >>>>>>>>>>>>>>>>>>>>>>>> your system is probably running out of file >>>>>>>>>>>>>>>>>>>>>>>> descriptors. You might check you >>>>>>>>>>>>>>>>>>>>>>>> system limits and see if you can get them revised >>>>>>>>>>>>>>>>>>>>>>>> upward. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> On Apr 17, 2010, at 4:24 PM, Grzegorz Maj wrote: >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Yes, I know. The problem is that I need to use some >>>>>>>>>>>>>>>>>>>>>>>>> special way for >>>>>>>>>>>>>>>>>>>>>>>>> running my processes provided by the environment in >>>>>>>>>>>>>>>>>>>>>>>>> which I'm >>>>>>>>>>>>>>>>>>>>>>>>> working >>>>>>>>>>>>>>>>>>>>>>>>> and unfortunately I can't use mpirun. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> 2010/4/18 Ralph Castain <r...@open-mpi.org>: >>>>>>>>>>>>>>>>>>>>>>>>>> Guess I don't understand why you can't use mpirun - >>>>>>>>>>>>>>>>>>>>>>>>>> all it does is >>>>>>>>>>>>>>>>>>>>>>>>>> start things, provide a means to forward io, etc. It >>>>>>>>>>>>>>>>>>>>>>>>>> mainly sits there >>>>>>>>>>>>>>>>>>>>>>>>>> quietly without using any cpu unless required to >>>>>>>>>>>>>>>>>>>>>>>>>> support the job. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Sounds like it would solve your problem. Otherwise, >>>>>>>>>>>>>>>>>>>>>>>>>> I know of no >>>>>>>>>>>>>>>>>>>>>>>>>> way to get all these processes into comm_world. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> On Apr 17, 2010, at 2:27 PM, Grzegorz Maj wrote: >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>>>>>>>>>>>> I'd like to dynamically create a group of processes >>>>>>>>>>>>>>>>>>>>>>>>>>> communicating >>>>>>>>>>>>>>>>>>>>>>>>>>> via >>>>>>>>>>>>>>>>>>>>>>>>>>> MPI. Those processes need to be run without mpirun >>>>>>>>>>>>>>>>>>>>>>>>>>> and create >>>>>>>>>>>>>>>>>>>>>>>>>>> intracommunicator after the startup. Any ideas how >>>>>>>>>>>>>>>>>>>>>>>>>>> to do this >>>>>>>>>>>>>>>>>>>>>>>>>>> efficiently? >>>>>>>>>>>>>>>>>>>>>>>>>>> I came up with a solution in which the processes >>>>>>>>>>>>>>>>>>>>>>>>>>> are connecting >>>>>>>>>>>>>>>>>>>>>>>>>>> one by >>>>>>>>>>>>>>>>>>>>>>>>>>> one using MPI_Comm_connect, but unfortunately all >>>>>>>>>>>>>>>>>>>>>>>>>>> the processes >>>>>>>>>>>>>>>>>>>>>>>>>>> that >>>>>>>>>>>>>>>>>>>>>>>>>>> are already in the group need to call >>>>>>>>>>>>>>>>>>>>>>>>>>> MPI_Comm_accept. This means >>>>>>>>>>>>>>>>>>>>>>>>>>> that >>>>>>>>>>>>>>>>>>>>>>>>>>> when the n-th process wants to connect I need to >>>>>>>>>>>>>>>>>>>>>>>>>>> collect all the >>>>>>>>>>>>>>>>>>>>>>>>>>> n-1 >>>>>>>>>>>>>>>>>>>>>>>>>>> processes on the MPI_Comm_accept call. After I run >>>>>>>>>>>>>>>>>>>>>>>>>>> about 40 >>>>>>>>>>>>>>>>>>>>>>>>>>> processes >>>>>>>>>>>>>>>>>>>>>>>>>>> every subsequent call takes more and more time, >>>>>>>>>>>>>>>>>>>>>>>>>>> which I'd like to >>>>>>>>>>>>>>>>>>>>>>>>>>> avoid. >>>>>>>>>>>>>>>>>>>>>>>>>>> Another problem in this solution is that when I try >>>>>>>>>>>>>>>>>>>>>>>>>>> to connect >>>>>>>>>>>>>>>>>>>>>>>>>>> 66-th >>>>>>>>>>>>>>>>>>>>>>>>>>> process the root of the existing group segfaults on >>>>>>>>>>>>>>>>>>>>>>>>>>> MPI_Comm_accept. >>>>>>>>>>>>>>>>>>>>>>>>>>> Maybe it's my bug, but it's weird as everything >>>>>>>>>>>>>>>>>>>>>>>>>>> works fine for at >>>>>>>>>>>>>>>>>>>>>>>>>>> most >>>>>>>>>>>>>>>>>>>>>>>>>>> 65 processes. Is there any limitation I don't know >>>>>>>>>>>>>>>>>>>>>>>>>>> about? >>>>>>>>>>>>>>>>>>>>>>>>>>> My last question is about MPI_COMM_WORLD. When I >>>>>>>>>>>>>>>>>>>>>>>>>>> run my processes >>>>>>>>>>>>>>>>>>>>>>>>>>> without mpirun their MPI_COMM_WORLD is the same as >>>>>>>>>>>>>>>>>>>>>>>>>>> MPI_COMM_SELF. >>>>>>>>>>>>>>>>>>>>>>>>>>> Is >>>>>>>>>>>>>>>>>>>>>>>>>>> there any way to change MPI_COMM_WORLD and set it >>>>>>>>>>>>>>>>>>>>>>>>>>> to the >>>>>>>>>>>>>>>>>>>>>>>>>>> intracommunicator that I've created? >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>>>>> Grzegorz Maj >>>>>>>>>>>>>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>>>>>>>>>>>>> users mailing list >>>>>>>>>>>>>>>>>>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>>>>>>>>>>>> users mailing list >>>>>>>>>>>>>>>>>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>>>>>>>>>>> users mailing list >>>>>>>>>>>>>>>>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>>>>>>>>>> users mailing list >>>>>>>>>>>>>>>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>>>>>>>> users mailing list >>>>>>>>>>>>>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>>>>>> users mailing list >>>>>>>>>>>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>>>>> users mailing list >>>>>>>>>>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>>>>> users mailing list >>>>>>>>>>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> <client.c><server.c>_______________________________________________ >>>>>>>>>>>>>>>>>> users mailing list >>>>>>>>>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>>> users mailing list >>>>>>>>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>> users mailing list >>>>>>>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>> users mailing list >>>>>>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>> users mailing list >>>>>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>> users mailing list >>>>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> users mailing list >>>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> users mailing list >>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> users mailing list >>>>>>>>>> us...@open-mpi.org >>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> users mailing list >>>>>>>> us...@open-mpi.org >>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> users mailing list >>>>>>> us...@open-mpi.org >>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> users mailing list >>>>>> us...@open-mpi.org >>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>> >>>>>> >>>>> >>>>> _______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> >>>> >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> >>>> >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users