Just so you don't have to wait for 1.4.3 release, here is the patch (doesn't include the prior patch).
dpm.diff
Description: Binary data
On Jul 12, 2010, at 12:13 PM, Grzegorz Maj wrote: > 2010/7/12 Ralph Castain <r...@open-mpi.org>: >> Dug around a bit and found the problem!! >> >> I have no idea who or why this was done, but somebody set a limit of 64 >> separate jobids in the dynamic init called by ompi_comm_set, which builds >> the intercommunicator. Unfortunately, they hard-wired the array size, but >> never check that size before adding to it. >> >> So after 64 calls to connect_accept, you are overwriting other areas of the >> code. As you found, hitting 66 causes it to segfault. >> >> I'll fix this on the developer's trunk (I'll also add that original patch to >> it). Rather than my searching this thread in detail, can you remind me what >> version you are using so I can patch it too? > > I'm using 1.4.2 > Thanks a lot and I'm looking forward for the patch. > >> >> Thanks for your patience with this! >> Ralph >> >> >> On Jul 12, 2010, at 7:20 AM, Grzegorz Maj wrote: >> >>> 1024 is not the problem: changing it to 2048 hasn't change anything. >>> Following your advice I've run my process using gdb. Unfortunately I >>> didn't get anything more than: >>> >>> Program received signal SIGSEGV, Segmentation fault. >>> [Switching to Thread 0xf7e4c6c0 (LWP 20246)] >>> 0xf7f39905 in ompi_comm_set () from /home/gmaj/openmpi/lib/libmpi.so.0 >>> >>> (gdb) bt >>> #0 0xf7f39905 in ompi_comm_set () from /home/gmaj/openmpi/lib/libmpi.so.0 >>> #1 0xf7e3ba95 in connect_accept () from >>> /home/gmaj/openmpi/lib/openmpi/mca_dpm_orte.so >>> #2 0xf7f62013 in PMPI_Comm_connect () from >>> /home/gmaj/openmpi/lib/libmpi.so.0 >>> #3 0x080489ed in main (argc=825832753, argv=0x34393638) at client.c:43 >>> >>> What's more: when I've added a breakpoint on ompi_comm_set in 66th >>> process and stepped a couple of instructions, one of the other >>> processes crashed (as usualy on ompi_comm_set) earlier than 66th did. >>> >>> Finally I decided to recompile openmpi using -g flag for gcc. In this >>> case the 66 processes issue has gone! I was running my applications >>> exactly the same way as previously (even without recompilation) and >>> I've run successfully over 130 processes. >>> When switching back to the openmpi compilation without -g it again >>> segfaults. >>> >>> Any ideas? I'm really confused. >>> >>> >>> >>> 2010/7/7 Ralph Castain <r...@open-mpi.org>: >>>> I would guess the #files limit of 1024. However, if it behaves the same >>>> way when spread across multiple machines, I would suspect it is somewhere >>>> in your program itself. Given that the segfault is in your process, can >>>> you use gdb to look at the core file and see where and why it fails? >>>> >>>> On Jul 7, 2010, at 10:17 AM, Grzegorz Maj wrote: >>>> >>>>> 2010/7/7 Ralph Castain <r...@open-mpi.org>: >>>>>> >>>>>> On Jul 6, 2010, at 8:48 AM, Grzegorz Maj wrote: >>>>>> >>>>>>> Hi Ralph, >>>>>>> sorry for the late response, but I couldn't find free time to play >>>>>>> with this. Finally I've applied the patch you prepared. I've launched >>>>>>> my processes in the way you've described and I think it's working as >>>>>>> you expected. None of my processes runs the orted daemon and they can >>>>>>> perform MPI operations. Unfortunately I'm still hitting the 65 >>>>>>> processes issue :( >>>>>>> Maybe I'm doing something wrong. >>>>>>> I attach my source code. If anybody could have a look on this, I would >>>>>>> be grateful. >>>>>>> >>>>>>> When I run that code with clients_count <= 65 everything works fine: >>>>>>> all the processes create a common grid, exchange some information and >>>>>>> disconnect. >>>>>>> When I set clients_count > 65 the 66th process crashes on >>>>>>> MPI_Comm_connect (segmentation fault). >>>>>> >>>>>> I didn't have time to check the code, but my guess is that you are still >>>>>> hitting some kind of file descriptor or other limit. Check to see what >>>>>> your limits are - usually "ulimit" will tell you. >>>>> >>>>> My limitations are: >>>>> time(seconds) unlimited >>>>> file(blocks) unlimited >>>>> data(kb) unlimited >>>>> stack(kb) 10240 >>>>> coredump(blocks) 0 >>>>> memory(kb) unlimited >>>>> locked memory(kb) 64 >>>>> process 200704 >>>>> nofiles 1024 >>>>> vmemory(kb) unlimited >>>>> locks unlimited >>>>> >>>>> Which one do you think could be responsible for that? >>>>> >>>>> I was trying to run all the 66 processes on one machine or spread them >>>>> across several machines and it always crashes the same way on the 66th >>>>> process. >>>>> >>>>>> >>>>>>> >>>>>>> Another thing I would like to know is if it's normal that any of my >>>>>>> processes when calling MPI_Comm_connect or MPI_Comm_accept when the >>>>>>> other side is not ready, is eating up a full CPU available. >>>>>> >>>>>> Yes - the waiting process is polling in a tight loop waiting for the >>>>>> connection to be made. >>>>>> >>>>>>> >>>>>>> Any help would be appreciated, >>>>>>> Grzegorz Maj >>>>>>> >>>>>>> >>>>>>> 2010/4/24 Ralph Castain <r...@open-mpi.org>: >>>>>>>> Actually, OMPI is distributed with a daemon that does pretty much what >>>>>>>> you >>>>>>>> want. Checkout "man ompi-server". I originally wrote that code to >>>>>>>> support >>>>>>>> cross-application MPI publish/subscribe operations, but we can utilize >>>>>>>> it >>>>>>>> here too. Have to blame me for not making it more publicly known. >>>>>>>> The attached patch upgrades ompi-server and modifies the singleton >>>>>>>> startup >>>>>>>> to provide your desired support. This solution works in the following >>>>>>>> manner: >>>>>>>> 1. launch "ompi-server -report-uri <filename>". This starts a >>>>>>>> persistent >>>>>>>> daemon called "ompi-server" that acts as a rendezvous point for >>>>>>>> independently started applications. The problem with starting >>>>>>>> different >>>>>>>> applications and wanting them to MPI connect/accept lies in the need >>>>>>>> to have >>>>>>>> the applications find each other. If they can't discover contact info >>>>>>>> for >>>>>>>> the other app, then they can't wire up their interconnects. The >>>>>>>> "ompi-server" tool provides that rendezvous point. I don't like that >>>>>>>> comm_accept segfaulted - should have just error'd out. >>>>>>>> 2. set OMPI_MCA_orte_server=file:<filename>" in the environment where >>>>>>>> you >>>>>>>> will start your processes. This will allow your singleton processes to >>>>>>>> find >>>>>>>> the ompi-server. I automatically also set the envar to connect the MPI >>>>>>>> publish/subscribe system for you. >>>>>>>> 3. run your processes. As they think they are singletons, they will >>>>>>>> detect >>>>>>>> the presence of the above envar and automatically connect themselves >>>>>>>> to the >>>>>>>> "ompi-server" daemon. This provides each process with the ability to >>>>>>>> perform >>>>>>>> any MPI-2 operation. >>>>>>>> I tested this on my machines and it worked, so hopefully it will meet >>>>>>>> your >>>>>>>> needs. You only need to run one "ompi-server" period, so long as you >>>>>>>> locate >>>>>>>> it where all of the processes can find the contact file and can open a >>>>>>>> TCP >>>>>>>> socket to the daemon. There is a way to knit multiple ompi-servers >>>>>>>> into a >>>>>>>> broader network (e.g., to connect processes that cannot directly >>>>>>>> access a >>>>>>>> server due to network segmentation), but it's a tad tricky - let me >>>>>>>> know if >>>>>>>> you require it and I'll try to help. >>>>>>>> If you have trouble wiring them all into a single communicator, you >>>>>>>> might >>>>>>>> ask separately about that and see if one of our MPI experts can provide >>>>>>>> advice (I'm just the RTE grunt). >>>>>>>> HTH - let me know how this works for you and I'll incorporate it into >>>>>>>> future >>>>>>>> OMPI releases. >>>>>>>> Ralph >>>>>>>> >>>>>>>> >>>>>>>> On Apr 24, 2010, at 1:49 AM, Krzysztof Zarzycki wrote: >>>>>>>> >>>>>>>> Hi Ralph, >>>>>>>> I'm Krzysztof and I'm working with Grzegorz Maj on this our small >>>>>>>> project/experiment. >>>>>>>> We definitely would like to give your patch a try. But could you please >>>>>>>> explain your solution a little more? >>>>>>>> You still would like to start one mpirun per mpi grid, and then have >>>>>>>> processes started by us to join the MPI comm? >>>>>>>> It is a good solution of course. >>>>>>>> But it would be especially preferable to have one daemon running >>>>>>>> persistently on our "entry" machine that can handle several mpi grid >>>>>>>> starts. >>>>>>>> Can your patch help us this way too? >>>>>>>> Thanks for your help! >>>>>>>> Krzysztof >>>>>>>> >>>>>>>> On 24 April 2010 03:51, Ralph Castain <r...@open-mpi.org> wrote: >>>>>>>>> >>>>>>>>> In thinking about this, my proposed solution won't entirely fix the >>>>>>>>> problem - you'll still wind up with all those daemons. I believe I can >>>>>>>>> resolve that one as well, but it would require a patch. >>>>>>>>> >>>>>>>>> Would you like me to send you something you could try? Might take a >>>>>>>>> couple >>>>>>>>> of iterations to get it right... >>>>>>>>> >>>>>>>>> On Apr 23, 2010, at 12:12 PM, Ralph Castain wrote: >>>>>>>>> >>>>>>>>>> Hmmm....I -think- this will work, but I cannot guarantee it: >>>>>>>>>> >>>>>>>>>> 1. launch one process (can just be a spinner) using mpirun that >>>>>>>>>> includes >>>>>>>>>> the following option: >>>>>>>>>> >>>>>>>>>> mpirun -report-uri file >>>>>>>>>> >>>>>>>>>> where file is some filename that mpirun can create and insert its >>>>>>>>>> contact info into it. This can be a relative or absolute path. This >>>>>>>>>> process >>>>>>>>>> must remain alive throughout your application - doesn't matter what >>>>>>>>>> it does. >>>>>>>>>> It's purpose is solely to keep mpirun alive. >>>>>>>>>> >>>>>>>>>> 2. set OMPI_MCA_dpm_orte_server=FILE:file in your environment, where >>>>>>>>>> "file" is the filename given above. This will tell your processes >>>>>>>>>> how to >>>>>>>>>> find mpirun, which is acting as a meeting place to handle the >>>>>>>>>> connect/accept >>>>>>>>>> operations >>>>>>>>>> >>>>>>>>>> Now run your processes, and have them connect/accept to each other. >>>>>>>>>> >>>>>>>>>> The reason I cannot guarantee this will work is that these processes >>>>>>>>>> will all have the same rank && name since they all start as >>>>>>>>>> singletons. >>>>>>>>>> Hence, connect/accept is likely to fail. >>>>>>>>>> >>>>>>>>>> But it -might- work, so you might want to give it a try. >>>>>>>>>> >>>>>>>>>> On Apr 23, 2010, at 8:10 AM, Grzegorz Maj wrote: >>>>>>>>>> >>>>>>>>>>> To be more precise: by 'server process' I mean some process that I >>>>>>>>>>> could run once on my system and it could help in creating those >>>>>>>>>>> groups. >>>>>>>>>>> My typical scenario is: >>>>>>>>>>> 1. run N separate processes, each without mpirun >>>>>>>>>>> 2. connect them into MPI group >>>>>>>>>>> 3. do some job >>>>>>>>>>> 4. exit all N processes >>>>>>>>>>> 5. goto 1 >>>>>>>>>>> >>>>>>>>>>> 2010/4/23 Grzegorz Maj <ma...@wp.pl>: >>>>>>>>>>>> Thank you Ralph for your explanation. >>>>>>>>>>>> And, apart from that descriptors' issue, is there any other way to >>>>>>>>>>>> solve my problem, i.e. to run separately a number of processes, >>>>>>>>>>>> without mpirun and then to collect them into an MPI intracomm >>>>>>>>>>>> group? >>>>>>>>>>>> If I for example would need to run some 'server process' (even >>>>>>>>>>>> using >>>>>>>>>>>> mpirun) for this task, that's OK. Any ideas? >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> Grzegorz Maj >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> 2010/4/18 Ralph Castain <r...@open-mpi.org>: >>>>>>>>>>>>> Okay, but here is the problem. If you don't use mpirun, and are >>>>>>>>>>>>> not >>>>>>>>>>>>> operating in an environment we support for "direct" launch (i.e., >>>>>>>>>>>>> starting >>>>>>>>>>>>> processes outside of mpirun), then every one of those processes >>>>>>>>>>>>> thinks it is >>>>>>>>>>>>> a singleton - yes? >>>>>>>>>>>>> >>>>>>>>>>>>> What you may not realize is that each singleton immediately >>>>>>>>>>>>> fork/exec's an orted daemon that is configured to behave just >>>>>>>>>>>>> like mpirun. >>>>>>>>>>>>> This is required in order to support MPI-2 operations such as >>>>>>>>>>>>> MPI_Comm_spawn, MPI_Comm_connect/accept, etc. >>>>>>>>>>>>> >>>>>>>>>>>>> So if you launch 64 processes that think they are singletons, then >>>>>>>>>>>>> you have 64 copies of orted running as well. This eats up a lot >>>>>>>>>>>>> of file >>>>>>>>>>>>> descriptors, which is probably why you are hitting this 65 >>>>>>>>>>>>> process limit - >>>>>>>>>>>>> your system is probably running out of file descriptors. You >>>>>>>>>>>>> might check you >>>>>>>>>>>>> system limits and see if you can get them revised upward. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Apr 17, 2010, at 4:24 PM, Grzegorz Maj wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Yes, I know. The problem is that I need to use some special way >>>>>>>>>>>>>> for >>>>>>>>>>>>>> running my processes provided by the environment in which I'm >>>>>>>>>>>>>> working >>>>>>>>>>>>>> and unfortunately I can't use mpirun. >>>>>>>>>>>>>> >>>>>>>>>>>>>> 2010/4/18 Ralph Castain <r...@open-mpi.org>: >>>>>>>>>>>>>>> Guess I don't understand why you can't use mpirun - all it does >>>>>>>>>>>>>>> is >>>>>>>>>>>>>>> start things, provide a means to forward io, etc. It mainly >>>>>>>>>>>>>>> sits there >>>>>>>>>>>>>>> quietly without using any cpu unless required to support the >>>>>>>>>>>>>>> job. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Sounds like it would solve your problem. Otherwise, I know of no >>>>>>>>>>>>>>> way to get all these processes into comm_world. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Apr 17, 2010, at 2:27 PM, Grzegorz Maj wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>> I'd like to dynamically create a group of processes >>>>>>>>>>>>>>>> communicating >>>>>>>>>>>>>>>> via >>>>>>>>>>>>>>>> MPI. Those processes need to be run without mpirun and create >>>>>>>>>>>>>>>> intracommunicator after the startup. Any ideas how to do this >>>>>>>>>>>>>>>> efficiently? >>>>>>>>>>>>>>>> I came up with a solution in which the processes are connecting >>>>>>>>>>>>>>>> one by >>>>>>>>>>>>>>>> one using MPI_Comm_connect, but unfortunately all the processes >>>>>>>>>>>>>>>> that >>>>>>>>>>>>>>>> are already in the group need to call MPI_Comm_accept. This >>>>>>>>>>>>>>>> means >>>>>>>>>>>>>>>> that >>>>>>>>>>>>>>>> when the n-th process wants to connect I need to collect all >>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>> n-1 >>>>>>>>>>>>>>>> processes on the MPI_Comm_accept call. After I run about 40 >>>>>>>>>>>>>>>> processes >>>>>>>>>>>>>>>> every subsequent call takes more and more time, which I'd like >>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>> avoid. >>>>>>>>>>>>>>>> Another problem in this solution is that when I try to connect >>>>>>>>>>>>>>>> 66-th >>>>>>>>>>>>>>>> process the root of the existing group segfaults on >>>>>>>>>>>>>>>> MPI_Comm_accept. >>>>>>>>>>>>>>>> Maybe it's my bug, but it's weird as everything works fine for >>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>> most >>>>>>>>>>>>>>>> 65 processes. Is there any limitation I don't know about? >>>>>>>>>>>>>>>> My last question is about MPI_COMM_WORLD. When I run my >>>>>>>>>>>>>>>> processes >>>>>>>>>>>>>>>> without mpirun their MPI_COMM_WORLD is the same as >>>>>>>>>>>>>>>> MPI_COMM_SELF. >>>>>>>>>>>>>>>> Is >>>>>>>>>>>>>>>> there any way to change MPI_COMM_WORLD and set it to the >>>>>>>>>>>>>>>> intracommunicator that I've created? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>> Grzegorz Maj >>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>> users mailing list >>>>>>>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>> users mailing list >>>>>>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>> users mailing list >>>>>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>> users mailing list >>>>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> users mailing list >>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> users mailing list >>>>>>>>> us...@open-mpi.org >>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> users mailing list >>>>>>>> us...@open-mpi.org >>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> users mailing list >>>>>>>> us...@open-mpi.org >>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>> >>>>>>> <client.c><server.c>_______________________________________________ >>>>>>> users mailing list >>>>>>> us...@open-mpi.org >>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> users mailing list >>>>>> us...@open-mpi.org >>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>> >>>>>> >>>>> >>>>> _______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> >>>> >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> >>>> >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users