It's worth noting that this new component will likely get pulled into 1.5.1 (we're refreshing a bunch of stuff in 1.5.1 -- this new component will be included in that refresh).
No specific timeline on 1.5.1 yet, though. On Jul 22, 2010, at 5:53 PM, Ralph Castain wrote: > Dev trunk looks okay right now - I think you'll be fine using it. My new > component -might- work with 1.5, but probably not with 1.4. I haven't checked > either of them. > > Anything at r23478 or above will have the new module. Let me know how it > works for you. I haven't tested it myself, but am pretty sure it should work. > > > On Jul 22, 2010, at 3:22 PM, Philippe wrote: > > > Ralph, > > > > Thank you so much!! > > > > I'll give it a try and let you know. > > > > I know it's a tough question, but how stable is the dev trunk? Can I > > just grab the latest and run, or am I better off taking your changes > > and copy them back in a stable release? (if so, which one? 1.4? 1.5?) > > > > p. > > > > On Thu, Jul 22, 2010 at 3:50 PM, Ralph Castain <r...@open-mpi.org> wrote: > >> It was easier for me to just construct this module than to explain how to > >> do so :-) > >> > >> I will commit it this evening (couple of hours from now) as that is our > >> standard practice. You'll need to use the developer's trunk, though, to > >> use it. > >> > >> Here are the envars you'll need to provide: > >> > >> Each process needs to get the same following values: > >> > >> * OMPI_MCA_ess=generic > >> * OMPI_MCA_orte_num_procs=<number of MPI procs> > >> * OMPI_MCA_orte_nodes=<a comma-separated list of nodenames where MPI procs > >> reside> > >> * OMPI_MCA_orte_ppn=<number of procs/node> > >> > >> Note that I have assumed this last value is a constant for simplicity. If > >> that isn't the case, let me know - you could instead provide it as a > >> comma-separated list of values with an entry for each node. > >> > >> In addition, you need to provide the following value that will be unique > >> to each process: > >> > >> * OMPI_MCA_orte_rank=<MPI rank> > >> > >> Finally, you have to provide a range of static TCP ports for use by the > >> processes. Pick any range that you know will be available across all the > >> nodes. You then need to ensure that each process sees the following envar: > >> > >> * OMPI_MCA_oob_tcp_static_ports=6000-6010 <== obviously, replace this > >> with your range > >> > >> You will need a port range that is at least equal to the ppn for the job > >> (each proc on a node will take one of the provided ports). > >> > >> That should do it. I compute everything else I need from those values. > >> > >> Does that work for you? > >> Ralph > >> > >> > >> On Jul 22, 2010, at 6:48 AM, Philippe wrote: > >> > >>> On Wed, Jul 21, 2010 at 10:44 AM, Ralph Castain <r...@open-mpi.org> wrote: > >>>> > >>>> On Jul 21, 2010, at 7:44 AM, Philippe wrote: > >>>> > >>>>> Ralph, > >>>>> > >>>>> Sorry for the late reply -- I was away on vacation. > >>>> > >>>> no problem at all! > >>>> > >>>>> > >>>>> regarding your earlier question about how many processes where > >>>>> involved when the memory was entirely allocated, it was only two, a > >>>>> sender and a receiver. I'm still trying to pinpoint what can be > >>>>> different between the standalone case and the "integrated" case. I > >>>>> will try to find out what part of the code is allocating memory in a > >>>>> loop. > >>>> > >>>> hmmm....that sounds like a bug in your program. let me know what you find > >>>> > >>>>> > >>>>> > >>>>> On Tue, Jul 20, 2010 at 12:51 AM, Ralph Castain <r...@open-mpi.org> > >>>>> wrote: > >>>>>> Well, I finally managed to make this work without the required > >>>>>> ompi-server rendezvous point. The fix is only in the devel trunk right > >>>>>> now - I'll have to ask the release managers for 1.5 and 1.4 if they > >>>>>> want it ported to those series. > >>>>>> > >>>>> > >>>>> great -- i'll give it a try > >>>>> > >>>>>> On the notion of integrating OMPI to your launch environment: remember > >>>>>> that we don't necessarily require that you use mpiexec for that > >>>>>> purpose. If your launch environment provides just a little info in the > >>>>>> environment of the launched procs, we can usually devise a method that > >>>>>> allows the procs to perform an MPI_Init as a single job without all > >>>>>> this work you are doing. > >>>>>> > >>>>> > >>>>> I'm working on creating operators using MPI for the IBM product > >>>>> "InfoSphere Streams". It has its own launching mechanism to start the > >>>>> processes. However I can pass some information to the processes that > >>>>> belong to the same job (Streams job -- which should neatly map to MPI > >>>>> job). > >>>>> > >>>>>> Only difference is that your procs will all block in MPI_Init until > >>>>>> they -all- have executed that function. If that isn't a problem, this > >>>>>> would be a much more scalable and reliable method than doing it thru > >>>>>> massive calls to MPI_Port_connect. > >>>>>> > >>>>> > >>>>> in the general case, that would be a problem, but for my prototype, > >>>>> this is acceptable. > >>>>> > >>>>> In general, each process is composed of operators, some may be MPI > >>>>> related and some may not. But in my case, I know ahead of time which > >>>>> processes will be part of the MPI job, so I can easily deal with the > >>>>> fact that they would block on MPI_init (actually -- MPI_thread_init > >>>>> since its using a lot of threads). > >>>> > >>>> We have talked in the past about creating a non-blocking MPI_Init as an > >>>> extension to the standard. It would lock you to Open MPI, though... > >>>> > >>>> Regardless, at some point you would have to know how many processes are > >>>> going to be part of the job so you can know when MPI_Init is complete. I > >>>> would think you would require that info for the singleton wireup anyway > >>>> - yes? Otherwise, how would you know when to quit running connect-accept? > >>>> > >>> > >>> the short answer is yes... although, the longer answer is a bit more > >>> complicated. currently I do know the number of connect I need to do on > >>> a per-port basis. a job can contains an arbitrary number of MPI > >>> processes, each opening one or more ports. so i know the count port by > >>> ports but I dont need to worry about how many MPI processes there is > >>> globally. to make things a bit more complicated, each MPI operator can > >>> be "fused" with other operators to make a process. each fused operator > >>> may or may not require MPI. the bottom line is, to get the total > >>> number of processes to calculate rank&size, I need to reverse engineer > >>> the fusing that the compiler may do. > >>> > >>> but that's ok, I'm willing to do that for our prototype :-) > >>> > >>>>> > >>>>> Is there a documentation or example I can use to see what information > >>>>> I can pass to the processes to enable that? Is it just environment > >>>>> variables? > >>>> > >>>> No real documentation - a lack I should probably fill. At the moment, we > >>>> don't have a "generic" module for standalone launch, but I can create > >>>> one as it is pretty trivial. I would then need you to pass each process > >>>> envars telling it the total number of processes in the MPI job, its rank > >>>> within that job, and a file where some rendezvous process (can be > >>>> rank=0) has provided that port string. Armed with that info, I can > >>>> wireup the job. > >>>> > >>>> Won't be as scalable as an mpirun-initiated startup, but will be much > >>>> better than doing it from singletons. > >>> > >>> that would be great. I can definitely pass environment variables to > >>> each process. > >>> > >>>> > >>>> Or if you prefer, we could setup an "infosphere" module that we could > >>>> customize for this system. Main thing here would be to provide us with > >>>> some kind of regex (or access to a file containing the info) that > >>>> describes the map of rank to node so we can construct the wireup > >>>> communication pattern. > >>>> > >>> > >>> i think for our prototype we are fine with the first method. I'd leave > >>> the cleaner implementation as a task for the product team ;-) > >>> > >>> regarding the "generic" module, is that something you can put together > >>> quickly? can I help in any way? > >>> > >>> Thanks! > >>> p > >>> > >>>> Either way would work. The second is more scalable, but I don't know if > >>>> you have (or can construct) the map info. > >>>> > >>>>> > >>>>> Many thanks! > >>>>> p. > >>>>> > >>>>>> > >>>>>> On Jul 18, 2010, at 4:09 PM, Philippe wrote: > >>>>>> > >>>>>>> Ralph, > >>>>>>> > >>>>>>> thanks for investigating. > >>>>>>> > >>>>>>> I've applied the two patches you mentioned earlier and ran with the > >>>>>>> ompi server. Although i was able to runn our standalone test, when I > >>>>>>> integrated the changes to our code, the processes entered a crazy loop > >>>>>>> and allocated all the memory available when calling MPI_Port_Connect. > >>>>>>> I was not able to identify why it works standalone but not integrated > >>>>>>> with our code. If I found why, I'll let your know. > >>>>>>> > >>>>>>> looking forward to your findings. We'll be happy to test any patches > >>>>>>> if you have some! > >>>>>>> > >>>>>>> p. > >>>>>>> > >>>>>>> On Sat, Jul 17, 2010 at 9:47 PM, Ralph Castain <r...@open-mpi.org> > >>>>>>> wrote: > >>>>>>>> Okay, I can reproduce this problem. Frankly, I don't think this ever > >>>>>>>> worked with OMPI, and I'm not sure how the choice of BTL makes a > >>>>>>>> difference. > >>>>>>>> > >>>>>>>> The program is crashing in the communicator definition, which > >>>>>>>> involves a communication over our internal out-of-band messaging > >>>>>>>> system. That system has zero connection to any BTL, so it should > >>>>>>>> crash either way. > >>>>>>>> > >>>>>>>> Regardless, I will play with this a little as time allows. Thanks > >>>>>>>> for the reproducer! > >>>>>>>> > >>>>>>>> > >>>>>>>> On Jun 25, 2010, at 7:23 AM, Philippe wrote: > >>>>>>>> > >>>>>>>>> Hi, > >>>>>>>>> > >>>>>>>>> I'm trying to run a test program which consists of a server > >>>>>>>>> creating a > >>>>>>>>> port using MPI_Open_port and N clients using MPI_Comm_connect to > >>>>>>>>> connect to the server. > >>>>>>>>> > >>>>>>>>> I'm able to do so with 1 server and 2 clients, but with 1 server + 3 > >>>>>>>>> clients, I get the following error message: > >>>>>>>>> > >>>>>>>>> [node003:32274] [[37084,0],0]:route_callback tried routing message > >>>>>>>>> from [[37084,1],0] to [[40912,1],0]:102, can't find route > >>>>>>>>> > >>>>>>>>> This is only happening with the openib BTL. With tcp BTL it works > >>>>>>>>> perfectly fine (ofud also works as a matter of fact...). This has > >>>>>>>>> been > >>>>>>>>> tested on two completely different clusters, with identical results. > >>>>>>>>> In either cases, the IB frabic works normally. > >>>>>>>>> > >>>>>>>>> Any help would be greatly appreciated! Several people in my team > >>>>>>>>> looked at the problem. Google and the mailing list archive did not > >>>>>>>>> provide any clue. I believe that from an MPI standpoint, my test > >>>>>>>>> program is valid (and it works with TCP, which make me feel better > >>>>>>>>> about the sequence of MPI calls) > >>>>>>>>> > >>>>>>>>> Regards, > >>>>>>>>> Philippe. > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> Background: > >>>>>>>>> > >>>>>>>>> I intend to use openMPI to transport data inside a much larger > >>>>>>>>> application. Because of that, I cannot used mpiexec. Each process is > >>>>>>>>> started by our own "job management" and use a name server to find > >>>>>>>>> about each others. Once all the clients are connected, I would like > >>>>>>>>> the server to do MPI_Recv to get the data from all the client. I > >>>>>>>>> dont > >>>>>>>>> care about the order or which client are sending data, as long as I > >>>>>>>>> can receive it with on call. Do do that, the clients and the server > >>>>>>>>> are going through a series of > >>>>>>>>> Comm_accept/Conn_connect/Intercomm_merge > >>>>>>>>> so that at the end, all the clients and the server are inside the > >>>>>>>>> same > >>>>>>>>> intracomm. > >>>>>>>>> > >>>>>>>>> Steps: > >>>>>>>>> > >>>>>>>>> I have a sample program that show the issue. I tried to make it as > >>>>>>>>> short as possible. It needs to be executed on a shared file system > >>>>>>>>> like NFS because the server write the port info to a file that the > >>>>>>>>> client will read. To reproduce the issue, the following steps should > >>>>>>>>> be performed: > >>>>>>>>> > >>>>>>>>> 0. compile the test with "mpicc -o ben12 ben12.c" > >>>>>>>>> 1. ssh to the machine that will be the server > >>>>>>>>> 2. run ./ben12 3 1 > >>>>>>>>> 3. ssh to the machine that will be the client #1 > >>>>>>>>> 4. run ./ben12 3 0 > >>>>>>>>> 5. repeat step 3-4 for client #2 and #3 > >>>>>>>>> > >>>>>>>>> the server accept the connection from client #1 and merge it in a > >>>>>>>>> new > >>>>>>>>> intracomm. It then accept connection from client #2 and merge it. > >>>>>>>>> when > >>>>>>>>> the client #3 arrives, the server accept the connection, but that > >>>>>>>>> cause client #1 and #2 to die with the error above (see the complete > >>>>>>>>> trace in the tarball). > >>>>>>>>> > >>>>>>>>> The exact steps are: > >>>>>>>>> > >>>>>>>>> - server open port > >>>>>>>>> - server does accept > >>>>>>>>> - client #1 does connect > >>>>>>>>> - server and client #1 do merge > >>>>>>>>> - server does accept > >>>>>>>>> - client #2 does connect > >>>>>>>>> - server, client #1 and client #2 do merge > >>>>>>>>> - server does accept > >>>>>>>>> - client #3 does connect > >>>>>>>>> - server, client #1, client #2 and client #3 do merge > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> My infiniband network works normally with other test programs or > >>>>>>>>> applications (MPI or others like Verbs). > >>>>>>>>> > >>>>>>>>> Info about my setup: > >>>>>>>>> > >>>>>>>>> openMPI version = 1.4.1 (I also tried 1.4.2, nightly snapshot of > >>>>>>>>> 1.4.3, nightly snapshot of 1.5 --- all show the same error) > >>>>>>>>> config.log in the tarball > >>>>>>>>> "ompi_info --all" in the tarball > >>>>>>>>> OFED version = 1.3 installed from RHEL 5.3 > >>>>>>>>> Distro = RedHat Entreprise Linux 5.3 > >>>>>>>>> Kernel = 2.6.18-128.4.1.el5 x86_64 > >>>>>>>>> subnet manager = built-in SM from the cisco/topspin switch > >>>>>>>>> output of ibv_devinfo included in the tarball (there are no > >>>>>>>>> "bad" nodes) > >>>>>>>>> "ulimit -l" says "unlimited" > >>>>>>>>> > >>>>>>>>> The tarball contains: > >>>>>>>>> > >>>>>>>>> - ben12.c: my test program showing the behavior > >>>>>>>>> - config.log / config.out / make.out / make-install.out / > >>>>>>>>> ifconfig.txt / ibv-devinfo.txt / ompi_info.txt > >>>>>>>>> - trace-tcp.txt: output of the server and each client when it > >>>>>>>>> works > >>>>>>>>> with TCP (I added "btl = tcp,self" in ~/.openmpi/mca-params.conf) > >>>>>>>>> - trace-ib.txt: output of the server and each client when it fails > >>>>>>>>> with IB (I added "btl = openib,self" in ~/.openmpi/mca-params.conf) > >>>>>>>>> > >>>>>>>>> I hope I provided enough info for somebody to reproduce the > >>>>>>>>> problem... > >>>>>>>>> <ompi-output.tar.bz2>_______________________________________________ > >>>>>>>>> users mailing list > >>>>>>>>> us...@open-mpi.org > >>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users > >>>>>>>> > >>>>>>>> > >>>>>>>> _______________________________________________ > >>>>>>>> users mailing list > >>>>>>>> us...@open-mpi.org > >>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users > >>>>>>>> > >>>>>>> > >>>>>>> _______________________________________________ > >>>>>>> users mailing list > >>>>>>> us...@open-mpi.org > >>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users > >>>>>> > >>>>>> > >>>>>> _______________________________________________ > >>>>>> users mailing list > >>>>>> us...@open-mpi.org > >>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users > >>>>>> > >>>>> > >>>>> _______________________________________________ > >>>>> users mailing list > >>>>> us...@open-mpi.org > >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users > >>>> > >>>> > >>>> _______________________________________________ > >>>> users mailing list > >>>> us...@open-mpi.org > >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users > >>>> > >>> > >>> _______________________________________________ > >>> users mailing list > >>> us...@open-mpi.org > >>> http://www.open-mpi.org/mailman/listinfo.cgi/users > >> > >> > >> _______________________________________________ > >> users mailing list > >> us...@open-mpi.org > >> http://www.open-mpi.org/mailman/listinfo.cgi/users > >> > > > > _______________________________________________ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/