hm, this looks actually correct. The question now basically is, why the intermediate hand-shake by the processes with rank 0 on the inter-communicator is not finishing. I am wandering whether this could be related to a problem reported in another thread (Processes stuck after MPI_Waitall() in 1.4.1)?
http://www.open-mpi.org/community/lists/users/2010/07/13720.php On 7/28/2010 4:01 AM, Grzegorz Maj wrote: > I've attached gdb to the client which has just connected to the grid. > Its bt is almost exactly the same as the server's one: > #0 0x428066d7 in sched_yield () from /lib/libc.so.6 > #1 0x00933cbf in opal_progress () at ../../opal/runtime/opal_progress.c:220 > #2 0x00d460b8 in opal_condition_wait (c=0xdc3160, m=0xdc31a0) at > ../../opal/threads/condition.h:99 > #3 0x00d463cc in ompi_request_default_wait_all (count=2, > requests=0xff8a36d0, statuses=0x0) at > ../../ompi/request/req_wait.c:262 > #4 0x00a1431f in mca_coll_inter_allgatherv_inter (sbuf=0xff8a3794, > scount=1, sdtype=0x8049400, rbuf=0xff8a3750, rcounts=0x80948e0, > disps=0x8093938, rdtype=0x8049400, comm=0x8094fb8, module=0x80954a0) > at ../../../../../ompi/mca/coll/inter/coll_inter_allgatherv.c:127 > #5 0x00d3198f in ompi_comm_determine_first (intercomm=0x8094fb8, > high=1) at ../../ompi/communicator/comm.c:1199 > #6 0x00d75833 in PMPI_Intercomm_merge (intercomm=0x8094fb8, high=1, > newcomm=0xff8a4c00) at pintercomm_merge.c:84 > #7 0x08048a16 in main (argc=892352312, argv=0x32323038) at client.c:28 > > I've tried both scenarios described: when hangs a client connecting > from machines B and C. In both cases bt looks the same. > How does it look like? > Shall I repost that using a different subject as Ralph suggested? > > Regards, > Grzegorz > > > > 2010/7/27 Edgar Gabriel <gabr...@cs.uh.edu>: >> based on your output shown here, there is absolutely nothing wrong >> (yet). Both processes are in the same function and do what they are >> supposed to do. >> >> However, I am fairly sure that the client process bt that you show is >> already part of current_intracomm. Could you try to create a bt of the >> process that is not yet part of current_intracomm (If I understand your >> code correctly, the intercommunicator is n-1 configuration, with each >> client process being part of n after the intercomm_merge). It would be >> interesting to see where that process is... >> >> Thanks >> Edgar >> >> On 7/27/2010 1:42 PM, Ralph Castain wrote: >>> This slides outside of my purview - I would suggest you post this question >>> with a different subject line specifically mentioning failure of >>> intercomm_merge to work so it attracts the attention of those with >>> knowledge of that area. >>> >>> >>> On Jul 27, 2010, at 9:30 AM, Grzegorz Maj wrote: >>> >>>> So now I have a new question. >>>> When I run my server and a lot of clients on the same machine, >>>> everything looks fine. >>>> >>>> But when I try to run the clients on several machines the most >>>> frequent scenario is: >>>> * server is stared on machine A >>>> * X (= 1, 4, 10, ..) clients are started on machine B and they connect >>>> successfully >>>> * the first client starting on machine C connects successfully to the >>>> server, but the whole grid hangs on MPI_Comm_merge (all the processes >>>> from intercommunicator get there). >>>> >>>> As I said it's the most frequent scenario. Sometimes I can connect the >>>> clients from several machines. Sometimes it hangs (always on >>>> MPI_Comm_merge) when connecting the clients from machine B. >>>> The interesting thing is, that if before MPI_Comm_merge I send a dummy >>>> message on the intercommunicator from process rank 0 in one group to >>>> process rank 0 in the other one, it will not hang on MPI_Comm_merge. >>>> >>>> I've tried both versions with and without the first patch (ompi-server >>>> as orted) but it doesn't change the behavior. >>>> >>>> I've attached gdb to my server, this is bt: >>>> #0 0xffffe410 in __kernel_vsyscall () >>>> #1 0x00637afc in sched_yield () from /lib/libc.so.6 >>>> #2 0xf7e8ce31 in opal_progress () at >>>> ../../opal/runtime/opal_progress.c:220 >>>> #3 0xf7f60ad4 in opal_condition_wait (c=0xf7fd7dc0, m=0xf7fd7e00) at >>>> ../../opal/threads/condition.h:99 >>>> #4 0xf7f60dee in ompi_request_default_wait_all (count=2, >>>> requests=0xff8d7754, statuses=0x0) at >>>> ../../ompi/request/req_wait.c:262 >>>> #5 0xf7d3e221 in mca_coll_inter_allgatherv_inter (sbuf=0xff8d7824, >>>> scount=1, sdtype=0x8049200, rbuf=0xff8d77e0, rcounts=0x9783df8, >>>> disps=0x9755520, rdtype=0x8049200, comm=0x978c2a8, module=0x9794b08) >>>> at ../../../../../ompi/mca/coll/inter/coll_inter_allgatherv.c:127 >>>> #6 0xf7f4c615 in ompi_comm_determine_first (intercomm=0x978c2a8, >>>> high=0) at ../../ompi/communicator/comm.c:1199 >>>> #7 0xf7f8d1d9 in PMPI_Intercomm_merge (intercomm=0x978c2a8, high=0, >>>> newcomm=0xff8d78c0) at pintercomm_merge.c:84 >>>> #8 0x0804893c in main (argc=Cannot access memory at address 0xf >>>> ) at server.c:50 >>>> >>>> And this is bt from one of the clients: >>>> #0 0xffffe410 in __kernel_vsyscall () >>>> #1 0x0064993b in poll () from /lib/libc.so.6 >>>> #2 0xf7de027f in poll_dispatch (base=0x8643fb8, arg=0x86442d8, >>>> tv=0xff82299c) at ../../../opal/event/poll.c:168 >>>> #3 0xf7dde4b2 in opal_event_base_loop (base=0x8643fb8, flags=2) at >>>> ../../../opal/event/event.c:807 >>>> #4 0xf7dde34f in opal_event_loop (flags=2) at >>>> ../../../opal/event/event.c:730 >>>> #5 0xf7dcfc77 in opal_progress () at >>>> ../../opal/runtime/opal_progress.c:189 >>>> #6 0xf7ea80b8 in opal_condition_wait (c=0xf7f25160, m=0xf7f251a0) at >>>> ../../opal/threads/condition.h:99 >>>> #7 0xf7ea7ff3 in ompi_request_wait_completion (req=0x8686680) at >>>> ../../ompi/request/request.h:375 >>>> #8 0xf7ea7ef1 in ompi_request_default_wait (req_ptr=0xff822ae8, >>>> status=0x0) at ../../ompi/request/req_wait.c:37 >>>> #9 0xf7c663a6 in ompi_coll_tuned_bcast_intra_generic >>>> (buffer=0xff822d20, original_count=1, datatype=0x868bd00, root=0, >>>> comm=0x86aa7f8, module=0x868b700, count_by_segment=1, tree=0x868b3d8) >>>> at ../../../../../ompi/mca/coll/tuned/coll_tuned_bcast.c:237 >>>> #10 0xf7c668ea in ompi_coll_tuned_bcast_intra_binomial >>>> (buffer=0xff822d20, count=1, datatype=0x868bd00, root=0, >>>> comm=0x86aa7f8, module=0x868b700, segsize=0) >>>> at ../../../../../ompi/mca/coll/tuned/coll_tuned_bcast.c:368 >>>> #11 0xf7c5af12 in ompi_coll_tuned_bcast_intra_dec_fixed >>>> (buff=0xff822d20, count=1, datatype=0x868bd00, root=0, comm=0x86aa7f8, >>>> module=0x868b700) >>>> at ../../../../../ompi/mca/coll/tuned/coll_tuned_decision_fixed.c:256 >>>> #12 0xf7c73269 in mca_coll_sync_bcast (buff=0xff822d20, count=1, >>>> datatype=0x868bd00, root=0, comm=0x86aa7f8, module=0x86aaa28) at >>>> ../../../../../ompi/mca/coll/sync/coll_sync_bcast.c:44 >>>> #13 0xf7c80381 in mca_coll_inter_allgatherv_inter (sbuf=0xff822d64, >>>> scount=0, sdtype=0x8049400, rbuf=0xff822d20, rcounts=0x868a188, >>>> disps=0x868abb8, rdtype=0x8049400, comm=0x86aa300, >>>> module=0x86aae18) at >>>> ../../../../../ompi/mca/coll/inter/coll_inter_allgatherv.c:134 >>>> #14 0xf7e9398f in ompi_comm_determine_first (intercomm=0x86aa300, >>>> high=0) at ../../ompi/communicator/comm.c:1199 >>>> #15 0xf7ed7833 in PMPI_Intercomm_merge (intercomm=0x86aa300, high=0, >>>> newcomm=0xff8241d0) at pintercomm_merge.c:84 >>>> #16 0x08048afd in main (argc=943274038, argv=0x33393133) at client.c:47 >>>> >>>> >>>> >>>> What do you think may cause the problem? >>>> >>>> >>>> 2010/7/26 Ralph Castain <r...@open-mpi.org>: >>>>> No problem at all - glad it works! >>>>> >>>>> On Jul 26, 2010, at 7:58 AM, Grzegorz Maj wrote: >>>>> >>>>>> Hi, >>>>>> I'm very sorry, but the problem was on my side. My installation >>>>>> process was not always taking the newest sources of openmpi. In this >>>>>> case it hasn't installed the version with the latest patch. Now I >>>>>> think everything works fine - I could run over 130 processes with no >>>>>> problems. >>>>>> I'm sorry again that I've wasted your time. And thank you for the patch. >>>>>> >>>>>> 2010/7/21 Ralph Castain <r...@open-mpi.org>: >>>>>>> We're having some problem replicating this once my patches are applied. >>>>>>> Can you send us your configure cmd? Just the output from "head >>>>>>> config.log" will do for now. >>>>>>> >>>>>>> Thanks! >>>>>>> >>>>>>> On Jul 20, 2010, at 9:09 AM, Grzegorz Maj wrote: >>>>>>> >>>>>>>> My start script looks almost exactly the same as the one published by >>>>>>>> Edgar, ie. the processes are starting one by one with no delay. >>>>>>>> >>>>>>>> 2010/7/20 Ralph Castain <r...@open-mpi.org>: >>>>>>>>> Grzegorz: something occurred to me. When you start all these >>>>>>>>> processes, how are you staggering their wireup? Are they flooding us, >>>>>>>>> or are you time-shifting them a little? >>>>>>>>> >>>>>>>>> >>>>>>>>> On Jul 19, 2010, at 10:32 AM, Edgar Gabriel wrote: >>>>>>>>> >>>>>>>>>> Hm, so I am not sure how to approach this. First of all, the test >>>>>>>>>> case >>>>>>>>>> works for me. I used up to 80 clients, and for both optimized and >>>>>>>>>> non-optimized compilation. I ran the tests with trunk (not with 1.4 >>>>>>>>>> series, but the communicator code is identical in both cases). >>>>>>>>>> Clearly, >>>>>>>>>> the patch from Ralph is necessary to make it work. >>>>>>>>>> >>>>>>>>>> Additionally, I went through the communicator creation code for >>>>>>>>>> dynamic >>>>>>>>>> communicators trying to find spots that could create problems. The >>>>>>>>>> only >>>>>>>>>> place that I found the number 64 appear is the fortran-to-c mapping >>>>>>>>>> arrays (e.g. for communicators), where the initial size of the table >>>>>>>>>> is >>>>>>>>>> 64. I looked twice over the pointer-array code to see whether we >>>>>>>>>> could >>>>>>>>>> have a problem their (since it is a key-piece of the cid allocation >>>>>>>>>> code >>>>>>>>>> for communicators), but I am fairly confident that it is correct. >>>>>>>>>> >>>>>>>>>> Note, that we have other (non-dynamic tests), were comm_set is called >>>>>>>>>> 100,000 times, and the code per se does not seem to have a problem >>>>>>>>>> due >>>>>>>>>> to being called too often. So I am not sure what else to look at. >>>>>>>>>> >>>>>>>>>> Edgar >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 7/13/2010 8:42 PM, Ralph Castain wrote: >>>>>>>>>>> As far as I can tell, it appears the problem is somewhere in our >>>>>>>>>>> communicator setup. The people knowledgeable on that area are going >>>>>>>>>>> to look into it later this week. >>>>>>>>>>> >>>>>>>>>>> I'm creating a ticket to track the problem and will copy you on it. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Jul 13, 2010, at 6:57 AM, Ralph Castain wrote: >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Jul 13, 2010, at 3:36 AM, Grzegorz Maj wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Bad news.. >>>>>>>>>>>>> I've tried the latest patch with and without the prior one, but it >>>>>>>>>>>>> hasn't changed anything. I've also tried using the old code but >>>>>>>>>>>>> with >>>>>>>>>>>>> the OMPI_DPM_BASE_MAXJOBIDS constant changed to 80, but it also >>>>>>>>>>>>> didn't >>>>>>>>>>>>> help. >>>>>>>>>>>>> While looking through the sources of openmpi-1.4.2 I couldn't >>>>>>>>>>>>> find any >>>>>>>>>>>>> call of the function ompi_dpm_base_mark_dyncomm. >>>>>>>>>>>> >>>>>>>>>>>> It isn't directly called - it shows in ompi_comm_set as >>>>>>>>>>>> ompi_dpm.mark_dyncomm. You were definitely overrunning that array, >>>>>>>>>>>> but I guess something else is also being hit. Have to look >>>>>>>>>>>> further... >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> 2010/7/12 Ralph Castain <r...@open-mpi.org>: >>>>>>>>>>>>>> Just so you don't have to wait for 1.4.3 release, here is the >>>>>>>>>>>>>> patch (doesn't include the prior patch). >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Jul 12, 2010, at 12:13 PM, Grzegorz Maj wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> 2010/7/12 Ralph Castain <r...@open-mpi.org>: >>>>>>>>>>>>>>>> Dug around a bit and found the problem!! >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I have no idea who or why this was done, but somebody set a >>>>>>>>>>>>>>>> limit of 64 separate jobids in the dynamic init called by >>>>>>>>>>>>>>>> ompi_comm_set, which builds the intercommunicator. >>>>>>>>>>>>>>>> Unfortunately, they hard-wired the array size, but never check >>>>>>>>>>>>>>>> that size before adding to it. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> So after 64 calls to connect_accept, you are overwriting other >>>>>>>>>>>>>>>> areas of the code. As you found, hitting 66 causes it to >>>>>>>>>>>>>>>> segfault. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I'll fix this on the developer's trunk (I'll also add that >>>>>>>>>>>>>>>> original patch to it). Rather than my searching this thread in >>>>>>>>>>>>>>>> detail, can you remind me what version you are using so I can >>>>>>>>>>>>>>>> patch it too? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I'm using 1.4.2 >>>>>>>>>>>>>>> Thanks a lot and I'm looking forward for the patch. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks for your patience with this! >>>>>>>>>>>>>>>> Ralph >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Jul 12, 2010, at 7:20 AM, Grzegorz Maj wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> 1024 is not the problem: changing it to 2048 hasn't change >>>>>>>>>>>>>>>>> anything. >>>>>>>>>>>>>>>>> Following your advice I've run my process using gdb. >>>>>>>>>>>>>>>>> Unfortunately I >>>>>>>>>>>>>>>>> didn't get anything more than: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Program received signal SIGSEGV, Segmentation fault. >>>>>>>>>>>>>>>>> [Switching to Thread 0xf7e4c6c0 (LWP 20246)] >>>>>>>>>>>>>>>>> 0xf7f39905 in ompi_comm_set () from >>>>>>>>>>>>>>>>> /home/gmaj/openmpi/lib/libmpi.so.0 >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> (gdb) bt >>>>>>>>>>>>>>>>> #0 0xf7f39905 in ompi_comm_set () from >>>>>>>>>>>>>>>>> /home/gmaj/openmpi/lib/libmpi.so.0 >>>>>>>>>>>>>>>>> #1 0xf7e3ba95 in connect_accept () from >>>>>>>>>>>>>>>>> /home/gmaj/openmpi/lib/openmpi/mca_dpm_orte.so >>>>>>>>>>>>>>>>> #2 0xf7f62013 in PMPI_Comm_connect () from >>>>>>>>>>>>>>>>> /home/gmaj/openmpi/lib/libmpi.so.0 >>>>>>>>>>>>>>>>> #3 0x080489ed in main (argc=825832753, argv=0x34393638) at >>>>>>>>>>>>>>>>> client.c:43 >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> What's more: when I've added a breakpoint on ompi_comm_set in >>>>>>>>>>>>>>>>> 66th >>>>>>>>>>>>>>>>> process and stepped a couple of instructions, one of the other >>>>>>>>>>>>>>>>> processes crashed (as usualy on ompi_comm_set) earlier than >>>>>>>>>>>>>>>>> 66th did. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Finally I decided to recompile openmpi using -g flag for gcc. >>>>>>>>>>>>>>>>> In this >>>>>>>>>>>>>>>>> case the 66 processes issue has gone! I was running my >>>>>>>>>>>>>>>>> applications >>>>>>>>>>>>>>>>> exactly the same way as previously (even without >>>>>>>>>>>>>>>>> recompilation) and >>>>>>>>>>>>>>>>> I've run successfully over 130 processes. >>>>>>>>>>>>>>>>> When switching back to the openmpi compilation without -g it >>>>>>>>>>>>>>>>> again segfaults. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Any ideas? I'm really confused. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> 2010/7/7 Ralph Castain <r...@open-mpi.org>: >>>>>>>>>>>>>>>>>> I would guess the #files limit of 1024. However, if it >>>>>>>>>>>>>>>>>> behaves the same way when spread across multiple machines, I >>>>>>>>>>>>>>>>>> would suspect it is somewhere in your program itself. Given >>>>>>>>>>>>>>>>>> that the segfault is in your process, can you use gdb to >>>>>>>>>>>>>>>>>> look at the core file and see where and why it fails? >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Jul 7, 2010, at 10:17 AM, Grzegorz Maj wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> 2010/7/7 Ralph Castain <r...@open-mpi.org>: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On Jul 6, 2010, at 8:48 AM, Grzegorz Maj wrote: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Hi Ralph, >>>>>>>>>>>>>>>>>>>>> sorry for the late response, but I couldn't find free >>>>>>>>>>>>>>>>>>>>> time to play >>>>>>>>>>>>>>>>>>>>> with this. Finally I've applied the patch you prepared. >>>>>>>>>>>>>>>>>>>>> I've launched >>>>>>>>>>>>>>>>>>>>> my processes in the way you've described and I think it's >>>>>>>>>>>>>>>>>>>>> working as >>>>>>>>>>>>>>>>>>>>> you expected. None of my processes runs the orted daemon >>>>>>>>>>>>>>>>>>>>> and they can >>>>>>>>>>>>>>>>>>>>> perform MPI operations. Unfortunately I'm still hitting >>>>>>>>>>>>>>>>>>>>> the 65 >>>>>>>>>>>>>>>>>>>>> processes issue :( >>>>>>>>>>>>>>>>>>>>> Maybe I'm doing something wrong. >>>>>>>>>>>>>>>>>>>>> I attach my source code. If anybody could have a look on >>>>>>>>>>>>>>>>>>>>> this, I would >>>>>>>>>>>>>>>>>>>>> be grateful. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> When I run that code with clients_count <= 65 everything >>>>>>>>>>>>>>>>>>>>> works fine: >>>>>>>>>>>>>>>>>>>>> all the processes create a common grid, exchange some >>>>>>>>>>>>>>>>>>>>> information and >>>>>>>>>>>>>>>>>>>>> disconnect. >>>>>>>>>>>>>>>>>>>>> When I set clients_count > 65 the 66th process crashes on >>>>>>>>>>>>>>>>>>>>> MPI_Comm_connect (segmentation fault). >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> I didn't have time to check the code, but my guess is that >>>>>>>>>>>>>>>>>>>> you are still hitting some kind of file descriptor or >>>>>>>>>>>>>>>>>>>> other limit. Check to see what your limits are - usually >>>>>>>>>>>>>>>>>>>> "ulimit" will tell you. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> My limitations are: >>>>>>>>>>>>>>>>>>> time(seconds) unlimited >>>>>>>>>>>>>>>>>>> file(blocks) unlimited >>>>>>>>>>>>>>>>>>> data(kb) unlimited >>>>>>>>>>>>>>>>>>> stack(kb) 10240 >>>>>>>>>>>>>>>>>>> coredump(blocks) 0 >>>>>>>>>>>>>>>>>>> memory(kb) unlimited >>>>>>>>>>>>>>>>>>> locked memory(kb) 64 >>>>>>>>>>>>>>>>>>> process 200704 >>>>>>>>>>>>>>>>>>> nofiles 1024 >>>>>>>>>>>>>>>>>>> vmemory(kb) unlimited >>>>>>>>>>>>>>>>>>> locks unlimited >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Which one do you think could be responsible for that? >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I was trying to run all the 66 processes on one machine or >>>>>>>>>>>>>>>>>>> spread them >>>>>>>>>>>>>>>>>>> across several machines and it always crashes the same way >>>>>>>>>>>>>>>>>>> on the 66th >>>>>>>>>>>>>>>>>>> process. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Another thing I would like to know is if it's normal that >>>>>>>>>>>>>>>>>>>>> any of my >>>>>>>>>>>>>>>>>>>>> processes when calling MPI_Comm_connect or >>>>>>>>>>>>>>>>>>>>> MPI_Comm_accept when the >>>>>>>>>>>>>>>>>>>>> other side is not ready, is eating up a full CPU >>>>>>>>>>>>>>>>>>>>> available. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Yes - the waiting process is polling in a tight loop >>>>>>>>>>>>>>>>>>>> waiting for the connection to be made. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Any help would be appreciated, >>>>>>>>>>>>>>>>>>>>> Grzegorz Maj >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> 2010/4/24 Ralph Castain <r...@open-mpi.org>: >>>>>>>>>>>>>>>>>>>>>> Actually, OMPI is distributed with a daemon that does >>>>>>>>>>>>>>>>>>>>>> pretty much what you >>>>>>>>>>>>>>>>>>>>>> want. Checkout "man ompi-server". I originally wrote >>>>>>>>>>>>>>>>>>>>>> that code to support >>>>>>>>>>>>>>>>>>>>>> cross-application MPI publish/subscribe operations, but >>>>>>>>>>>>>>>>>>>>>> we can utilize it >>>>>>>>>>>>>>>>>>>>>> here too. Have to blame me for not making it more >>>>>>>>>>>>>>>>>>>>>> publicly known. >>>>>>>>>>>>>>>>>>>>>> The attached patch upgrades ompi-server and modifies the >>>>>>>>>>>>>>>>>>>>>> singleton startup >>>>>>>>>>>>>>>>>>>>>> to provide your desired support. This solution works in >>>>>>>>>>>>>>>>>>>>>> the following >>>>>>>>>>>>>>>>>>>>>> manner: >>>>>>>>>>>>>>>>>>>>>> 1. launch "ompi-server -report-uri <filename>". This >>>>>>>>>>>>>>>>>>>>>> starts a persistent >>>>>>>>>>>>>>>>>>>>>> daemon called "ompi-server" that acts as a rendezvous >>>>>>>>>>>>>>>>>>>>>> point for >>>>>>>>>>>>>>>>>>>>>> independently started applications. The problem with >>>>>>>>>>>>>>>>>>>>>> starting different >>>>>>>>>>>>>>>>>>>>>> applications and wanting them to MPI connect/accept lies >>>>>>>>>>>>>>>>>>>>>> in the need to have >>>>>>>>>>>>>>>>>>>>>> the applications find each other. If they can't discover >>>>>>>>>>>>>>>>>>>>>> contact info for >>>>>>>>>>>>>>>>>>>>>> the other app, then they can't wire up their >>>>>>>>>>>>>>>>>>>>>> interconnects. The >>>>>>>>>>>>>>>>>>>>>> "ompi-server" tool provides that rendezvous point. I >>>>>>>>>>>>>>>>>>>>>> don't like that >>>>>>>>>>>>>>>>>>>>>> comm_accept segfaulted - should have just error'd out. >>>>>>>>>>>>>>>>>>>>>> 2. set OMPI_MCA_orte_server=file:<filename>" in the >>>>>>>>>>>>>>>>>>>>>> environment where you >>>>>>>>>>>>>>>>>>>>>> will start your processes. This will allow your >>>>>>>>>>>>>>>>>>>>>> singleton processes to find >>>>>>>>>>>>>>>>>>>>>> the ompi-server. I automatically also set the envar to >>>>>>>>>>>>>>>>>>>>>> connect the MPI >>>>>>>>>>>>>>>>>>>>>> publish/subscribe system for you. >>>>>>>>>>>>>>>>>>>>>> 3. run your processes. As they think they are >>>>>>>>>>>>>>>>>>>>>> singletons, they will detect >>>>>>>>>>>>>>>>>>>>>> the presence of the above envar and automatically >>>>>>>>>>>>>>>>>>>>>> connect themselves to the >>>>>>>>>>>>>>>>>>>>>> "ompi-server" daemon. This provides each process with >>>>>>>>>>>>>>>>>>>>>> the ability to perform >>>>>>>>>>>>>>>>>>>>>> any MPI-2 operation. >>>>>>>>>>>>>>>>>>>>>> I tested this on my machines and it worked, so hopefully >>>>>>>>>>>>>>>>>>>>>> it will meet your >>>>>>>>>>>>>>>>>>>>>> needs. You only need to run one "ompi-server" period, so >>>>>>>>>>>>>>>>>>>>>> long as you locate >>>>>>>>>>>>>>>>>>>>>> it where all of the processes can find the contact file >>>>>>>>>>>>>>>>>>>>>> and can open a TCP >>>>>>>>>>>>>>>>>>>>>> socket to the daemon. There is a way to knit multiple >>>>>>>>>>>>>>>>>>>>>> ompi-servers into a >>>>>>>>>>>>>>>>>>>>>> broader network (e.g., to connect processes that cannot >>>>>>>>>>>>>>>>>>>>>> directly access a >>>>>>>>>>>>>>>>>>>>>> server due to network segmentation), but it's a tad >>>>>>>>>>>>>>>>>>>>>> tricky - let me know if >>>>>>>>>>>>>>>>>>>>>> you require it and I'll try to help. >>>>>>>>>>>>>>>>>>>>>> If you have trouble wiring them all into a single >>>>>>>>>>>>>>>>>>>>>> communicator, you might >>>>>>>>>>>>>>>>>>>>>> ask separately about that and see if one of our MPI >>>>>>>>>>>>>>>>>>>>>> experts can provide >>>>>>>>>>>>>>>>>>>>>> advice (I'm just the RTE grunt). >>>>>>>>>>>>>>>>>>>>>> HTH - let me know how this works for you and I'll >>>>>>>>>>>>>>>>>>>>>> incorporate it into future >>>>>>>>>>>>>>>>>>>>>> OMPI releases. >>>>>>>>>>>>>>>>>>>>>> Ralph >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> On Apr 24, 2010, at 1:49 AM, Krzysztof Zarzycki wrote: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Hi Ralph, >>>>>>>>>>>>>>>>>>>>>> I'm Krzysztof and I'm working with Grzegorz Maj on this >>>>>>>>>>>>>>>>>>>>>> our small >>>>>>>>>>>>>>>>>>>>>> project/experiment. >>>>>>>>>>>>>>>>>>>>>> We definitely would like to give your patch a try. But >>>>>>>>>>>>>>>>>>>>>> could you please >>>>>>>>>>>>>>>>>>>>>> explain your solution a little more? >>>>>>>>>>>>>>>>>>>>>> You still would like to start one mpirun per mpi grid, >>>>>>>>>>>>>>>>>>>>>> and then have >>>>>>>>>>>>>>>>>>>>>> processes started by us to join the MPI comm? >>>>>>>>>>>>>>>>>>>>>> It is a good solution of course. >>>>>>>>>>>>>>>>>>>>>> But it would be especially preferable to have one daemon >>>>>>>>>>>>>>>>>>>>>> running >>>>>>>>>>>>>>>>>>>>>> persistently on our "entry" machine that can handle >>>>>>>>>>>>>>>>>>>>>> several mpi grid starts. >>>>>>>>>>>>>>>>>>>>>> Can your patch help us this way too? >>>>>>>>>>>>>>>>>>>>>> Thanks for your help! >>>>>>>>>>>>>>>>>>>>>> Krzysztof >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> On 24 April 2010 03:51, Ralph Castain >>>>>>>>>>>>>>>>>>>>>> <r...@open-mpi.org> wrote: >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> In thinking about this, my proposed solution won't >>>>>>>>>>>>>>>>>>>>>>> entirely fix the >>>>>>>>>>>>>>>>>>>>>>> problem - you'll still wind up with all those daemons. >>>>>>>>>>>>>>>>>>>>>>> I believe I can >>>>>>>>>>>>>>>>>>>>>>> resolve that one as well, but it would require a patch. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Would you like me to send you something you could try? >>>>>>>>>>>>>>>>>>>>>>> Might take a couple >>>>>>>>>>>>>>>>>>>>>>> of iterations to get it right... >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> On Apr 23, 2010, at 12:12 PM, Ralph Castain wrote: >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Hmmm....I -think- this will work, but I cannot >>>>>>>>>>>>>>>>>>>>>>>> guarantee it: >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> 1. launch one process (can just be a spinner) using >>>>>>>>>>>>>>>>>>>>>>>> mpirun that includes >>>>>>>>>>>>>>>>>>>>>>>> the following option: >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> mpirun -report-uri file >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> where file is some filename that mpirun can create and >>>>>>>>>>>>>>>>>>>>>>>> insert its >>>>>>>>>>>>>>>>>>>>>>>> contact info into it. This can be a relative or >>>>>>>>>>>>>>>>>>>>>>>> absolute path. This process >>>>>>>>>>>>>>>>>>>>>>>> must remain alive throughout your application - >>>>>>>>>>>>>>>>>>>>>>>> doesn't matter what it does. >>>>>>>>>>>>>>>>>>>>>>>> It's purpose is solely to keep mpirun alive. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> 2. set OMPI_MCA_dpm_orte_server=FILE:file in your >>>>>>>>>>>>>>>>>>>>>>>> environment, where >>>>>>>>>>>>>>>>>>>>>>>> "file" is the filename given above. This will tell >>>>>>>>>>>>>>>>>>>>>>>> your processes how to >>>>>>>>>>>>>>>>>>>>>>>> find mpirun, which is acting as a meeting place to >>>>>>>>>>>>>>>>>>>>>>>> handle the connect/accept >>>>>>>>>>>>>>>>>>>>>>>> operations >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Now run your processes, and have them connect/accept >>>>>>>>>>>>>>>>>>>>>>>> to each other. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> The reason I cannot guarantee this will work is that >>>>>>>>>>>>>>>>>>>>>>>> these processes >>>>>>>>>>>>>>>>>>>>>>>> will all have the same rank && name since they all >>>>>>>>>>>>>>>>>>>>>>>> start as singletons. >>>>>>>>>>>>>>>>>>>>>>>> Hence, connect/accept is likely to fail. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> But it -might- work, so you might want to give it a >>>>>>>>>>>>>>>>>>>>>>>> try. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> On Apr 23, 2010, at 8:10 AM, Grzegorz Maj wrote: >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> To be more precise: by 'server process' I mean some >>>>>>>>>>>>>>>>>>>>>>>>> process that I >>>>>>>>>>>>>>>>>>>>>>>>> could run once on my system and it could help in >>>>>>>>>>>>>>>>>>>>>>>>> creating those >>>>>>>>>>>>>>>>>>>>>>>>> groups. >>>>>>>>>>>>>>>>>>>>>>>>> My typical scenario is: >>>>>>>>>>>>>>>>>>>>>>>>> 1. run N separate processes, each without mpirun >>>>>>>>>>>>>>>>>>>>>>>>> 2. connect them into MPI group >>>>>>>>>>>>>>>>>>>>>>>>> 3. do some job >>>>>>>>>>>>>>>>>>>>>>>>> 4. exit all N processes >>>>>>>>>>>>>>>>>>>>>>>>> 5. goto 1 >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> 2010/4/23 Grzegorz Maj <ma...@wp.pl>: >>>>>>>>>>>>>>>>>>>>>>>>>> Thank you Ralph for your explanation. >>>>>>>>>>>>>>>>>>>>>>>>>> And, apart from that descriptors' issue, is there >>>>>>>>>>>>>>>>>>>>>>>>>> any other way to >>>>>>>>>>>>>>>>>>>>>>>>>> solve my problem, i.e. to run separately a number of >>>>>>>>>>>>>>>>>>>>>>>>>> processes, >>>>>>>>>>>>>>>>>>>>>>>>>> without mpirun and then to collect them into an MPI >>>>>>>>>>>>>>>>>>>>>>>>>> intracomm group? >>>>>>>>>>>>>>>>>>>>>>>>>> If I for example would need to run some 'server >>>>>>>>>>>>>>>>>>>>>>>>>> process' (even using >>>>>>>>>>>>>>>>>>>>>>>>>> mpirun) for this task, that's OK. Any ideas? >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>>>> Grzegorz Maj >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> 2010/4/18 Ralph Castain <r...@open-mpi.org>: >>>>>>>>>>>>>>>>>>>>>>>>>>> Okay, but here is the problem. If you don't use >>>>>>>>>>>>>>>>>>>>>>>>>>> mpirun, and are not >>>>>>>>>>>>>>>>>>>>>>>>>>> operating in an environment we support for "direct" >>>>>>>>>>>>>>>>>>>>>>>>>>> launch (i.e., starting >>>>>>>>>>>>>>>>>>>>>>>>>>> processes outside of mpirun), then every one of >>>>>>>>>>>>>>>>>>>>>>>>>>> those processes thinks it is >>>>>>>>>>>>>>>>>>>>>>>>>>> a singleton - yes? >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> What you may not realize is that each singleton >>>>>>>>>>>>>>>>>>>>>>>>>>> immediately >>>>>>>>>>>>>>>>>>>>>>>>>>> fork/exec's an orted daemon that is configured to >>>>>>>>>>>>>>>>>>>>>>>>>>> behave just like mpirun. >>>>>>>>>>>>>>>>>>>>>>>>>>> This is required in order to support MPI-2 >>>>>>>>>>>>>>>>>>>>>>>>>>> operations such as >>>>>>>>>>>>>>>>>>>>>>>>>>> MPI_Comm_spawn, MPI_Comm_connect/accept, etc. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> So if you launch 64 processes that think they are >>>>>>>>>>>>>>>>>>>>>>>>>>> singletons, then >>>>>>>>>>>>>>>>>>>>>>>>>>> you have 64 copies of orted running as well. This >>>>>>>>>>>>>>>>>>>>>>>>>>> eats up a lot of file >>>>>>>>>>>>>>>>>>>>>>>>>>> descriptors, which is probably why you are hitting >>>>>>>>>>>>>>>>>>>>>>>>>>> this 65 process limit - >>>>>>>>>>>>>>>>>>>>>>>>>>> your system is probably running out of file >>>>>>>>>>>>>>>>>>>>>>>>>>> descriptors. You might check you >>>>>>>>>>>>>>>>>>>>>>>>>>> system limits and see if you can get them revised >>>>>>>>>>>>>>>>>>>>>>>>>>> upward. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> On Apr 17, 2010, at 4:24 PM, Grzegorz Maj wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Yes, I know. The problem is that I need to use >>>>>>>>>>>>>>>>>>>>>>>>>>>> some special way for >>>>>>>>>>>>>>>>>>>>>>>>>>>> running my processes provided by the environment >>>>>>>>>>>>>>>>>>>>>>>>>>>> in which I'm >>>>>>>>>>>>>>>>>>>>>>>>>>>> working >>>>>>>>>>>>>>>>>>>>>>>>>>>> and unfortunately I can't use mpirun. >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> 2010/4/18 Ralph Castain <r...@open-mpi.org>: >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Guess I don't understand why you can't use mpirun >>>>>>>>>>>>>>>>>>>>>>>>>>>>> - all it does is >>>>>>>>>>>>>>>>>>>>>>>>>>>>> start things, provide a means to forward io, etc. >>>>>>>>>>>>>>>>>>>>>>>>>>>>> It mainly sits there >>>>>>>>>>>>>>>>>>>>>>>>>>>>> quietly without using any cpu unless required to >>>>>>>>>>>>>>>>>>>>>>>>>>>>> support the job. >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Sounds like it would solve your problem. >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Otherwise, I know of no >>>>>>>>>>>>>>>>>>>>>>>>>>>>> way to get all these processes into comm_world. >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Apr 17, 2010, at 2:27 PM, Grzegorz Maj wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'd like to dynamically create a group of >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> processes communicating >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> via >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> MPI. Those processes need to be run without >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> mpirun and create >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> intracommunicator after the startup. Any ideas >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> how to do this >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> efficiently? >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I came up with a solution in which the processes >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> are connecting >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> one by >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> one using MPI_Comm_connect, but unfortunately >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> all the processes >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> are already in the group need to call >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> MPI_Comm_accept. This means >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> when the n-th process wants to connect I need to >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> collect all the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> n-1 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> processes on the MPI_Comm_accept call. After I >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> run about 40 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> processes >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> every subsequent call takes more and more time, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> which I'd like to >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> avoid. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Another problem in this solution is that when I >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> try to connect >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 66-th >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> process the root of the existing group segfaults >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> on >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> MPI_Comm_accept. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Maybe it's my bug, but it's weird as everything >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> works fine for at >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> most >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 65 processes. Is there any limitation I don't >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> know about? >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> My last question is about MPI_COMM_WORLD. When I >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> run my processes >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> without mpirun their MPI_COMM_WORLD is the same >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> as MPI_COMM_SELF. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Is >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> there any way to change MPI_COMM_WORLD and set >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> it to the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> intracommunicator that I've created? >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Grzegorz Maj >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> users mailing list >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>>>>>>>>>>>>>>> users mailing list >>>>>>>>>>>>>>>>>>>>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>>>>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>>>>>>>>>>>>>> users mailing list >>>>>>>>>>>>>>>>>>>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>>>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>>>>>>>>>>>>> users mailing list >>>>>>>>>>>>>>>>>>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>>>>>>>>>>> users mailing list >>>>>>>>>>>>>>>>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>>>>>>>>> users mailing list >>>>>>>>>>>>>>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>>>>>>>> users mailing list >>>>>>>>>>>>>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>>>>>>>> users mailing list >>>>>>>>>>>>>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> <client.c><server.c>_______________________________________________ >>>>>>>>>>>>>>>>>>>>> users mailing list >>>>>>>>>>>>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>>>>>> users mailing list >>>>>>>>>>>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>>>>> users mailing list >>>>>>>>>>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>>>> users mailing list >>>>>>>>>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>>> users mailing list >>>>>>>>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>> users mailing list >>>>>>>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>> users mailing list >>>>>>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>> users mailing list >>>>>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>> users mailing list >>>>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> users mailing list >>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> users mailing list >>>>>>>>>> us...@open-mpi.org >>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>> >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> users mailing list >>>>>>>>> us...@open-mpi.org >>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> users mailing list >>>>>>>> us...@open-mpi.org >>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> users mailing list >>>>>>> us...@open-mpi.org >>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>> >>>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> users mailing list >>>>>> us...@open-mpi.org >>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> >>>>> >>>>> _______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> >>>>> >>>> >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> -- >> Edgar Gabriel >> Assistant Professor >> Parallel Software Technologies Lab http://pstl.cs.uh.edu >> Department of Computer Science University of Houston >> Philip G. Hoffman Hall, Room 524 Houston, TX-77204, USA >> Tel: +1 (713) 743-3857 Fax: +1 (713) 743-3335 >> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Edgar Gabriel Assistant Professor Parallel Software Technologies Lab http://pstl.cs.uh.edu Department of Computer Science University of Houston Philip G. Hoffman Hall, Room 524 Houston, TX-77204, USA Tel: +1 (713) 743-3857 Fax: +1 (713) 743-3335
signature.asc
Description: OpenPGP digital signature