Doh - yes it should! I'll fix it right now. Thanks!
On Jul 26, 2010, at 9:28 PM, Philippe wrote: > Ralph, > > i was able to test the generic module and it seems to be working. > > one question tho, the function orte_ess_generic_component_query in > "orte/mca/ess/generic/ess_generic_component.c" calls getenv with the > argument "OMPI_MCA_enc", which seems to cause the module to fail to > load. shouldnt it be "OMPI_MCA_ess" ? > > ..... > > /* only pick us if directed to do so */ > if (NULL != (pick = getenv("OMPI_MCA_env")) && > 0 == strcmp(pick, "generic")) { > *priority = 1000; > *module = (mca_base_module_t *)&orte_ess_generic_module; > > ... > > p. > > On Thu, Jul 22, 2010 at 5:53 PM, Ralph Castain <r...@open-mpi.org> wrote: >> Dev trunk looks okay right now - I think you'll be fine using it. My new >> component -might- work with 1.5, but probably not with 1.4. I haven't >> checked either of them. >> >> Anything at r23478 or above will have the new module. Let me know how it >> works for you. I haven't tested it myself, but am pretty sure it should work. >> >> >> On Jul 22, 2010, at 3:22 PM, Philippe wrote: >> >>> Ralph, >>> >>> Thank you so much!! >>> >>> I'll give it a try and let you know. >>> >>> I know it's a tough question, but how stable is the dev trunk? Can I >>> just grab the latest and run, or am I better off taking your changes >>> and copy them back in a stable release? (if so, which one? 1.4? 1.5?) >>> >>> p. >>> >>> On Thu, Jul 22, 2010 at 3:50 PM, Ralph Castain <r...@open-mpi.org> wrote: >>>> It was easier for me to just construct this module than to explain how to >>>> do so :-) >>>> >>>> I will commit it this evening (couple of hours from now) as that is our >>>> standard practice. You'll need to use the developer's trunk, though, to >>>> use it. >>>> >>>> Here are the envars you'll need to provide: >>>> >>>> Each process needs to get the same following values: >>>> >>>> * OMPI_MCA_ess=generic >>>> * OMPI_MCA_orte_num_procs=<number of MPI procs> >>>> * OMPI_MCA_orte_nodes=<a comma-separated list of nodenames where MPI procs >>>> reside> >>>> * OMPI_MCA_orte_ppn=<number of procs/node> >>>> >>>> Note that I have assumed this last value is a constant for simplicity. If >>>> that isn't the case, let me know - you could instead provide it as a >>>> comma-separated list of values with an entry for each node. >>>> >>>> In addition, you need to provide the following value that will be unique >>>> to each process: >>>> >>>> * OMPI_MCA_orte_rank=<MPI rank> >>>> >>>> Finally, you have to provide a range of static TCP ports for use by the >>>> processes. Pick any range that you know will be available across all the >>>> nodes. You then need to ensure that each process sees the following envar: >>>> >>>> * OMPI_MCA_oob_tcp_static_ports=6000-6010 <== obviously, replace this >>>> with your range >>>> >>>> You will need a port range that is at least equal to the ppn for the job >>>> (each proc on a node will take one of the provided ports). >>>> >>>> That should do it. I compute everything else I need from those values. >>>> >>>> Does that work for you? >>>> Ralph >>>> >>>> >>>> On Jul 22, 2010, at 6:48 AM, Philippe wrote: >>>> >>>>> On Wed, Jul 21, 2010 at 10:44 AM, Ralph Castain <r...@open-mpi.org> wrote: >>>>>> >>>>>> On Jul 21, 2010, at 7:44 AM, Philippe wrote: >>>>>> >>>>>>> Ralph, >>>>>>> >>>>>>> Sorry for the late reply -- I was away on vacation. >>>>>> >>>>>> no problem at all! >>>>>> >>>>>>> >>>>>>> regarding your earlier question about how many processes where >>>>>>> involved when the memory was entirely allocated, it was only two, a >>>>>>> sender and a receiver. I'm still trying to pinpoint what can be >>>>>>> different between the standalone case and the "integrated" case. I >>>>>>> will try to find out what part of the code is allocating memory in a >>>>>>> loop. >>>>>> >>>>>> hmmm....that sounds like a bug in your program. let me know what you find >>>>>> >>>>>>> >>>>>>> >>>>>>> On Tue, Jul 20, 2010 at 12:51 AM, Ralph Castain <r...@open-mpi.org> >>>>>>> wrote: >>>>>>>> Well, I finally managed to make this work without the required >>>>>>>> ompi-server rendezvous point. The fix is only in the devel trunk right >>>>>>>> now - I'll have to ask the release managers for 1.5 and 1.4 if they >>>>>>>> want it ported to those series. >>>>>>>> >>>>>>> >>>>>>> great -- i'll give it a try >>>>>>> >>>>>>>> On the notion of integrating OMPI to your launch environment: remember >>>>>>>> that we don't necessarily require that you use mpiexec for that >>>>>>>> purpose. If your launch environment provides just a little info in the >>>>>>>> environment of the launched procs, we can usually devise a method that >>>>>>>> allows the procs to perform an MPI_Init as a single job without all >>>>>>>> this work you are doing. >>>>>>>> >>>>>>> >>>>>>> I'm working on creating operators using MPI for the IBM product >>>>>>> "InfoSphere Streams". It has its own launching mechanism to start the >>>>>>> processes. However I can pass some information to the processes that >>>>>>> belong to the same job (Streams job -- which should neatly map to MPI >>>>>>> job). >>>>>>> >>>>>>>> Only difference is that your procs will all block in MPI_Init until >>>>>>>> they -all- have executed that function. If that isn't a problem, this >>>>>>>> would be a much more scalable and reliable method than doing it thru >>>>>>>> massive calls to MPI_Port_connect. >>>>>>>> >>>>>>> >>>>>>> in the general case, that would be a problem, but for my prototype, >>>>>>> this is acceptable. >>>>>>> >>>>>>> In general, each process is composed of operators, some may be MPI >>>>>>> related and some may not. But in my case, I know ahead of time which >>>>>>> processes will be part of the MPI job, so I can easily deal with the >>>>>>> fact that they would block on MPI_init (actually -- MPI_thread_init >>>>>>> since its using a lot of threads). >>>>>> >>>>>> We have talked in the past about creating a non-blocking MPI_Init as an >>>>>> extension to the standard. It would lock you to Open MPI, though... >>>>>> >>>>>> Regardless, at some point you would have to know how many processes are >>>>>> going to be part of the job so you can know when MPI_Init is complete. I >>>>>> would think you would require that info for the singleton wireup anyway >>>>>> - yes? Otherwise, how would you know when to quit running connect-accept? >>>>>> >>>>> >>>>> the short answer is yes... although, the longer answer is a bit more >>>>> complicated. currently I do know the number of connect I need to do on >>>>> a per-port basis. a job can contains an arbitrary number of MPI >>>>> processes, each opening one or more ports. so i know the count port by >>>>> ports but I dont need to worry about how many MPI processes there is >>>>> globally. to make things a bit more complicated, each MPI operator can >>>>> be "fused" with other operators to make a process. each fused operator >>>>> may or may not require MPI. the bottom line is, to get the total >>>>> number of processes to calculate rank&size, I need to reverse engineer >>>>> the fusing that the compiler may do. >>>>> >>>>> but that's ok, I'm willing to do that for our prototype :-) >>>>> >>>>>>> >>>>>>> Is there a documentation or example I can use to see what information >>>>>>> I can pass to the processes to enable that? Is it just environment >>>>>>> variables? >>>>>> >>>>>> No real documentation - a lack I should probably fill. At the moment, we >>>>>> don't have a "generic" module for standalone launch, but I can create >>>>>> one as it is pretty trivial. I would then need you to pass each process >>>>>> envars telling it the total number of processes in the MPI job, its rank >>>>>> within that job, and a file where some rendezvous process (can be >>>>>> rank=0) has provided that port string. Armed with that info, I can >>>>>> wireup the job. >>>>>> >>>>>> Won't be as scalable as an mpirun-initiated startup, but will be much >>>>>> better than doing it from singletons. >>>>> >>>>> that would be great. I can definitely pass environment variables to >>>>> each process. >>>>> >>>>>> >>>>>> Or if you prefer, we could setup an "infosphere" module that we could >>>>>> customize for this system. Main thing here would be to provide us with >>>>>> some kind of regex (or access to a file containing the info) that >>>>>> describes the map of rank to node so we can construct the wireup >>>>>> communication pattern. >>>>>> >>>>> >>>>> i think for our prototype we are fine with the first method. I'd leave >>>>> the cleaner implementation as a task for the product team ;-) >>>>> >>>>> regarding the "generic" module, is that something you can put together >>>>> quickly? can I help in any way? >>>>> >>>>> Thanks! >>>>> p >>>>> >>>>>> Either way would work. The second is more scalable, but I don't know if >>>>>> you have (or can construct) the map info. >>>>>> >>>>>>> >>>>>>> Many thanks! >>>>>>> p. >>>>>>> >>>>>>>> >>>>>>>> On Jul 18, 2010, at 4:09 PM, Philippe wrote: >>>>>>>> >>>>>>>>> Ralph, >>>>>>>>> >>>>>>>>> thanks for investigating. >>>>>>>>> >>>>>>>>> I've applied the two patches you mentioned earlier and ran with the >>>>>>>>> ompi server. Although i was able to runn our standalone test, when I >>>>>>>>> integrated the changes to our code, the processes entered a crazy loop >>>>>>>>> and allocated all the memory available when calling MPI_Port_Connect. >>>>>>>>> I was not able to identify why it works standalone but not integrated >>>>>>>>> with our code. If I found why, I'll let your know. >>>>>>>>> >>>>>>>>> looking forward to your findings. We'll be happy to test any patches >>>>>>>>> if you have some! >>>>>>>>> >>>>>>>>> p. >>>>>>>>> >>>>>>>>> On Sat, Jul 17, 2010 at 9:47 PM, Ralph Castain <r...@open-mpi.org> >>>>>>>>> wrote: >>>>>>>>>> Okay, I can reproduce this problem. Frankly, I don't think this ever >>>>>>>>>> worked with OMPI, and I'm not sure how the choice of BTL makes a >>>>>>>>>> difference. >>>>>>>>>> >>>>>>>>>> The program is crashing in the communicator definition, which >>>>>>>>>> involves a communication over our internal out-of-band messaging >>>>>>>>>> system. That system has zero connection to any BTL, so it should >>>>>>>>>> crash either way. >>>>>>>>>> >>>>>>>>>> Regardless, I will play with this a little as time allows. Thanks >>>>>>>>>> for the reproducer! >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Jun 25, 2010, at 7:23 AM, Philippe wrote: >>>>>>>>>> >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> I'm trying to run a test program which consists of a server >>>>>>>>>>> creating a >>>>>>>>>>> port using MPI_Open_port and N clients using MPI_Comm_connect to >>>>>>>>>>> connect to the server. >>>>>>>>>>> >>>>>>>>>>> I'm able to do so with 1 server and 2 clients, but with 1 server + 3 >>>>>>>>>>> clients, I get the following error message: >>>>>>>>>>> >>>>>>>>>>> [node003:32274] [[37084,0],0]:route_callback tried routing message >>>>>>>>>>> from [[37084,1],0] to [[40912,1],0]:102, can't find route >>>>>>>>>>> >>>>>>>>>>> This is only happening with the openib BTL. With tcp BTL it works >>>>>>>>>>> perfectly fine (ofud also works as a matter of fact...). This has >>>>>>>>>>> been >>>>>>>>>>> tested on two completely different clusters, with identical results. >>>>>>>>>>> In either cases, the IB frabic works normally. >>>>>>>>>>> >>>>>>>>>>> Any help would be greatly appreciated! Several people in my team >>>>>>>>>>> looked at the problem. Google and the mailing list archive did not >>>>>>>>>>> provide any clue. I believe that from an MPI standpoint, my test >>>>>>>>>>> program is valid (and it works with TCP, which make me feel better >>>>>>>>>>> about the sequence of MPI calls) >>>>>>>>>>> >>>>>>>>>>> Regards, >>>>>>>>>>> Philippe. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Background: >>>>>>>>>>> >>>>>>>>>>> I intend to use openMPI to transport data inside a much larger >>>>>>>>>>> application. Because of that, I cannot used mpiexec. Each process is >>>>>>>>>>> started by our own "job management" and use a name server to find >>>>>>>>>>> about each others. Once all the clients are connected, I would like >>>>>>>>>>> the server to do MPI_Recv to get the data from all the client. I >>>>>>>>>>> dont >>>>>>>>>>> care about the order or which client are sending data, as long as I >>>>>>>>>>> can receive it with on call. Do do that, the clients and the server >>>>>>>>>>> are going through a series of >>>>>>>>>>> Comm_accept/Conn_connect/Intercomm_merge >>>>>>>>>>> so that at the end, all the clients and the server are inside the >>>>>>>>>>> same >>>>>>>>>>> intracomm. >>>>>>>>>>> >>>>>>>>>>> Steps: >>>>>>>>>>> >>>>>>>>>>> I have a sample program that show the issue. I tried to make it as >>>>>>>>>>> short as possible. It needs to be executed on a shared file system >>>>>>>>>>> like NFS because the server write the port info to a file that the >>>>>>>>>>> client will read. To reproduce the issue, the following steps should >>>>>>>>>>> be performed: >>>>>>>>>>> >>>>>>>>>>> 0. compile the test with "mpicc -o ben12 ben12.c" >>>>>>>>>>> 1. ssh to the machine that will be the server >>>>>>>>>>> 2. run ./ben12 3 1 >>>>>>>>>>> 3. ssh to the machine that will be the client #1 >>>>>>>>>>> 4. run ./ben12 3 0 >>>>>>>>>>> 5. repeat step 3-4 for client #2 and #3 >>>>>>>>>>> >>>>>>>>>>> the server accept the connection from client #1 and merge it in a >>>>>>>>>>> new >>>>>>>>>>> intracomm. It then accept connection from client #2 and merge it. >>>>>>>>>>> when >>>>>>>>>>> the client #3 arrives, the server accept the connection, but that >>>>>>>>>>> cause client #1 and #2 to die with the error above (see the complete >>>>>>>>>>> trace in the tarball). >>>>>>>>>>> >>>>>>>>>>> The exact steps are: >>>>>>>>>>> >>>>>>>>>>> - server open port >>>>>>>>>>> - server does accept >>>>>>>>>>> - client #1 does connect >>>>>>>>>>> - server and client #1 do merge >>>>>>>>>>> - server does accept >>>>>>>>>>> - client #2 does connect >>>>>>>>>>> - server, client #1 and client #2 do merge >>>>>>>>>>> - server does accept >>>>>>>>>>> - client #3 does connect >>>>>>>>>>> - server, client #1, client #2 and client #3 do merge >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> My infiniband network works normally with other test programs or >>>>>>>>>>> applications (MPI or others like Verbs). >>>>>>>>>>> >>>>>>>>>>> Info about my setup: >>>>>>>>>>> >>>>>>>>>>> openMPI version = 1.4.1 (I also tried 1.4.2, nightly snapshot of >>>>>>>>>>> 1.4.3, nightly snapshot of 1.5 --- all show the same error) >>>>>>>>>>> config.log in the tarball >>>>>>>>>>> "ompi_info --all" in the tarball >>>>>>>>>>> OFED version = 1.3 installed from RHEL 5.3 >>>>>>>>>>> Distro = RedHat Entreprise Linux 5.3 >>>>>>>>>>> Kernel = 2.6.18-128.4.1.el5 x86_64 >>>>>>>>>>> subnet manager = built-in SM from the cisco/topspin switch >>>>>>>>>>> output of ibv_devinfo included in the tarball (there are no >>>>>>>>>>> "bad" nodes) >>>>>>>>>>> "ulimit -l" says "unlimited" >>>>>>>>>>> >>>>>>>>>>> The tarball contains: >>>>>>>>>>> >>>>>>>>>>> - ben12.c: my test program showing the behavior >>>>>>>>>>> - config.log / config.out / make.out / make-install.out / >>>>>>>>>>> ifconfig.txt / ibv-devinfo.txt / ompi_info.txt >>>>>>>>>>> - trace-tcp.txt: output of the server and each client when it >>>>>>>>>>> works >>>>>>>>>>> with TCP (I added "btl = tcp,self" in ~/.openmpi/mca-params.conf) >>>>>>>>>>> - trace-ib.txt: output of the server and each client when it fails >>>>>>>>>>> with IB (I added "btl = openib,self" in ~/.openmpi/mca-params.conf) >>>>>>>>>>> >>>>>>>>>>> I hope I provided enough info for somebody to reproduce the >>>>>>>>>>> problem... >>>>>>>>>>> <ompi-output.tar.bz2>_______________________________________________ >>>>>>>>>>> users mailing list >>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> users mailing list >>>>>>>>>> us...@open-mpi.org >>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>> >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> users mailing list >>>>>>>>> us...@open-mpi.org >>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> users mailing list >>>>>>>> us...@open-mpi.org >>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> users mailing list >>>>>>> us...@open-mpi.org >>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> users mailing list >>>>>> us...@open-mpi.org >>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>> >>>>> >>>>> _______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> >>>> >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users