Re: [OMPI users] Issues with DL POLY
Very interesting, I certainly hope that my problem is this and not some kind of error. I'll put the program on some more nodes and run some tests and see what runs fastest. My only experience so far with MPI is with LAMMPS, and the simulation I ran had an almost linear speedup from 1 -> 10 machines (11 -> 1.2 hours), very satisfying! Aaron Thompson Vanderbilt University aaron.p.thomp...@vanderbilt.edu On Jun 7, 2007, at 8:44 PM, Brock Palen wrote: We have a few users using DLPOLY with OMPI. Running just fine. Watch out what kind of simulation you are doing like all MD software, not all simulations are better in parallel. In some the comunication overhead is much worse than running on just one cpu. I see this all the time. You could try just 2 cpus, on one node some times that is ok (memory access vs network access) But its not uncommon. Brock Palen Center for Advanced Computing bro...@umich.edu (734)936-1985 On Jun 7, 2007, at 8:24 PM, Aaron Thompson wrote: Hello, Does anyone have experience using DL POLY with OpenMPI? I've gotten it to compile, but when I run a simulation using mpirun with two dual- processor machines, it runs a little *slower* than on one CPU on one machine! Yet the program is running two instances on each node. Any ideas? The test programs included with OpenMPI show that it is running correctly across multiple nodes. Sorry if this is a little off-topic, I wasn't able to find help on the official DL POLY mailing list. Thank you! Aaron Thompson Vanderbilt University aaron.p.thomp...@vanderbilt.edu ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
[OMPI users] mpirun in openmpi-1.2.2 fails to exit after client program finishes
I compiled openmpi-1.2.2 with: ./configure CFLAGS=-g -pg -O3 --prefix=/home/foo/490_research/490/src/mpi.optimized_profiling/ \ --enable-mpi-threads --enable-progress-threads --enable-static --disable-shared --without-memory-manager \ --without-libnuma --disable-mpi-f77 --disable-mpi-f90 --disable-mpi-cxx --disable-mpi-cxx-seek --disable-dlopen (Thanks Jeff, now I know that I have to add --without-memory-manager and --without-libnuma for static linking) make all make install then I run my client app with: /home/foo/490_research/490/src/mpi.optimized_profiling/bin/mpirun --hostfile ../hostfile -n 32 raytrace -finputs/car.env The program runs well and each process completes succssfully (I can tell because all processes have now generated gmon.out successfully and a "ps aux" on other slave nodes (except the originating node) show that my program in slave nodes have already exited (not existant). Therefore I think this may have something to do with mpirun,which hangs forever. Can you see anything wrong in my ./configure command which explains the mpirun hang at the end of the run? How can I fix it? Thanks!
Re: [OMPI users] how to identify openmpi in configure script
Would it be helpful if we provided some way to link in all the MPI language bindings? Examples off the top of my head (haven't thought any of these through): - mpicxx_all ... - setenv OMPI_WRAPPER_WANT_ALL_LANGUAGE_BINDINGS mpicxx ... - mpicxx -ompi:all_languages ... On Jun 6, 2007, at 12:05 PM, Lie-Quan Lee wrote: Hi Jeff, Thanks for willing to put more thought on it. Here is my simplified story. I have an accelerator physics code, Omega3P that is to perform complex eigenmode analysis. The algorithm for solving eigensystems makes use of a 3rd-party sparse direct solver called MUMPS (http:// graal.ens-lyon.fr/MUMPS/). Omega3P is written in C++ with MPI. MUMPS is written in Fortran 95 with MPI fortran binding. And MUMPS requires ScaLAPACK and BLACS. (sometime the vendor provides a scientific library that includes BLACS and ScaLAPACK). They are both written in Fortran 77 with MPI Fortran binding. I often need to compile them in various computer platforms with different compilers for variety of reasons. As I mentioned before, I use C++ compiler to link the final executable. That will require MPI Fortran libraries and general Fortran libraries. What I did to solve the above problem is, I have a configure script in which I will detect the compiler and the platform, based on that I will add compiler and platform specific flags for the Fortran related stuff (libraries and library path). This does well until it hit next new platform/compiler... Some compilers made the above job slightly easier. For example in Pathscale compiler collection, it provides -lpathfortran for all what I need to link the executable using c++ compiler with fortran compiled libraries. So is IBM visual age compiler set if the wraper compilers (mpcc_r, mpf90_r) are used. The library name (-lxlf90_r) is different, though. best regards, Rich Lee On Jun 6, 2007, at 4:16 AM, Jeff Squyres wrote: On Jun 5, 2007, at 11:17 PM, Lie-Quan Lee wrote: it is a quite of headache for each compiler/platform to deal with mixed language issues. I have to compile my application on IBM visual age compiler, Pathscale, Cray X1E compiler, intel/openmpi, intel/mpich, PGI compiler ... And of course, openmpi 1.1 is different on this comparing with openmpi 1.2.2 (-lmpi_f77 is new to 1.2.2 version). :-) You are right. MPI forum most like will not take care of this. I just made a wish ... :-) Understood; I know it's a pain. :-( What I want to understand, however, is what you need to do. It seems like your needs are a bit different than those of the mainstream -- is there a way that we can support you directly instead of forcing you to a) identify openmpi, b) call mpi --showme:link to get the relevant flags, and c) stitch them together in the manner that you need? We take great pains to ensure that the mpi wrapper compilers "just work" for all the common cases in order to avoid all the "you must identify which MPI you are using" kinds of games. Your case sounds somewhat unusual, but perhaps there's a way we can get the information to you in a more direct manner...? -- Jeff Squyres Cisco Systems ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres Cisco Systems
Re: [OMPI users] how to identify openmpi in configure script
On Fri, 8 Jun 2007, Jeff Squyres wrote: > Would it be helpful if we provided some way to link in all the MPI > language bindings? > > Examples off the top of my head (haven't thought any of these through): > > - mpicxx_all ... > - setenv OMPI_WRAPPER_WANT_ALL_LANGUAGE_BINDINGS >mpicxx ... > - mpicxx -ompi:all_languages ... > Maybe this wrapper should be called "mpild" or "mpilinker". A.Chan > > On Jun 6, 2007, at 12:05 PM, Lie-Quan Lee wrote: > > > Hi Jeff, > > > > Thanks for willing to put more thought on it. Here is my simplified > > story. I have an accelerator physics code, Omega3P that is to perform > > complex eigenmode analysis. The algorithm for solving eigensystems > > makes use of a 3rd-party sparse direct solver called MUMPS (http:// > > graal.ens-lyon.fr/MUMPS/). Omega3P is written in C++ with MPI. MUMPS > > is written in Fortran 95 with MPI fortran binding. And MUMPS requires > > ScaLAPACK and BLACS. (sometime the vendor provides a scientific > > library that includes BLACS and ScaLAPACK). They are both written in > > Fortran 77 with MPI Fortran binding. > > > > I often need to compile them in various computer platforms with > > different compilers for variety of reasons. > > As I mentioned before, I use C++ compiler to link the final > > executable. That will require MPI Fortran libraries and general > > Fortran libraries. > > > > What I did to solve the above problem is, I have a configure script > > in which I will detect the compiler and the platform, based on that I > > will add compiler and platform specific flags for the Fortran related > > stuff (libraries and library path). This does well until it hit next > > new platform/compiler... > > > > Some compilers made the above job slightly easier. For example in > > Pathscale compiler collection, it provides -lpathfortran for all what > > I need to link the executable using c++ compiler with fortran > > compiled libraries. So is IBM visual age compiler set if the wraper > > compilers (mpcc_r, mpf90_r) are used. The library name (-lxlf90_r) is > > different, though. > > > > > > best regards, > > Rich Lee > > > > > > On Jun 6, 2007, at 4:16 AM, Jeff Squyres wrote: > > > >> On Jun 5, 2007, at 11:17 PM, Lie-Quan Lee wrote: > >> > >>> it is a quite of headache for each compiler/platform to deal with > >>> mixed language > >>> issues. I have to compile my application on IBM visual age > >>> compiler, > >>> Pathscale, Cray X1E compiler, > >>> intel/openmpi, intel/mpich, PGI compiler ... > >>> And of course, openmpi 1.1 is different on this comparing with > >>> openmpi 1.2.2 (-lmpi_f77 is new to 1.2.2 version). :-) > >> > >>> You are right. MPI forum most like will not take care of this. I > >>> just > >>> made a wish ... :-) > >> > >> Understood; I know it's a pain. :-( > >> > >> What I want to understand, however, is what you need to do. It seems > >> like your needs are a bit different than those of the mainstream -- > >> is there a way that we can support you directly instead of forcing > >> you to a) identify openmpi, b) call mpi --showme:link to get the > >> relevant flags, and c) stitch them together in the manner that you > >> need? > >> > >> We take great pains to ensure that the mpi wrapper compilers > >> "just work" for all the common cases in order to avoid all the "you > >> must identify which MPI you are using" kinds of games. Your case > >> sounds somewhat unusual, but perhaps there's a way we can get the > >> information to you in a more direct manner...? > >> > >> -- > >> Jeff Squyres > >> Cisco Systems > >> > >> ___ > >> users mailing list > >> us...@open-mpi.org > >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > ___ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > -- > Jeff Squyres > Cisco Systems > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > >
[OMPI users] v1.2.2 mca base unable to open pls/ras tm
Hi, I uninstalled and deleted our old installation directories of 1.1.4 and 1.2.1 so I could have it nice and clean for 1.2.2. I extracted the source and ran configure with these options: --prefix=/opt/openmpi/st --with-devel-headers --with-tm=/opt/torque I then build and installed it. But when I ran a program I got these messages from each of my processes: : mca: base: component_find: unable to open pls tm: File not found (ignored) : mca: base: component_find: unable to open ras tm: File not found (ignored) This was the first time that Open MPI was configured with -with-tm as torque wasn't installed previously. I found out that torque was not installed to /opt/torque as it was supposed to be, but to it's default location. So I reran the configure without --with-tm and rebuilt and reinstalled (after uninstalling the previous build). But I still got the same messages. So I completely deleted the source directory and destination directory, extract the source, ran configure, rebuild and installed. But still the same errors. According to the open mpi faq, support for torque must be explicitly added via configure. (http://www.open-mpi.org/faq/?category=building#build-rte-tm) So is it still including it somehow? Thanks, Matt __ Matt Cupp Battelle Memorial Institute Statistics and Information Analysis
Re: [OMPI users] v1.2.2 mca base unable to open pls/ras tm
"File not found" is the strerror corresponding to the error we get when we call dlopen. So I don't think it's directly related to the mca_pls_tm.so library but to one of it's missing dependencies. Do you have access to the /opt/torque directory on all nodes in your cluster ? george. On Jun 8, 2007, at 1:22 PM, Cupp, Matthew R wrote: Hi, I uninstalled and deleted our old installation directories of 1.1.4 and 1.2.1 so I could have it nice and clean for 1.2.2. I extracted the source and ran configure with these options: --prefix=/opt/openmpi/st --with-devel-headers --with-tm=/opt/torque I then build and installed it. But when I ran a program I got these messages from each of my processes: : mca: base: component_find: unable to open pls tm: File not found (ignored) : mca: base: component_find: unable to open ras tm: File not found (ignored) This was the first time that Open MPI was configured with –with-tm as torque wasn’t installed previously. I found out that torque was not installed to /opt/torque as it was supposed to be, but to it’s default location. So I reran the configure without --with-tm and rebuilt and reinstalled (after uninstalling the previous build). But I still got the same messages. So I completely deleted the source directory and destination directory, extract the source, ran configure, rebuild and installed. But still the same errors. According to the open mpi faq, support for torque must be explicitly added via configure. (http://www.open-mpi.org/faq/?category=building#build-rte-tm) So is it still including it somehow? Thanks, Matt __ Matt Cupp Battelle Memorial Institute Statistics and Information Analysis ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users smime.p7s Description: S/MIME cryptographic signature
Re: [OMPI users] v1.2.2 mca base unable to open pls/ras tm
Yes. But the /opt/torque directory is just the source, not the actual installed directory. The actual installed directory on the head node is the default location of /usr/lib/something. And that is not accessable by every node. But should it matter if it's not accessable if I don't specify --with-tm? I was wondering if ./configure detects torque has been installed, and then builds the associated components under the assumption that it's available. Matt __ Matt Cupp Battelle Memorial Institute Statistics and Information Analysis 614-424-5471 -Original Message- From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf Of George Bosilca Sent: Friday, June 08, 2007 2:00 PM To: Open MPI Users Subject: Re: [OMPI users] v1.2.2 mca base unable to open pls/ras tm "File not found" is the strerror corresponding to the error we get when we call dlopen. So I don't think it's directly related to the mca_pls_tm.so library but to one of it's missing dependencies. Do you have access to the /opt/torque directory on all nodes in your cluster ? george. On Jun 8, 2007, at 1:22 PM, Cupp, Matthew R wrote: > Hi, > > > > I uninstalled and deleted our old installation directories of 1.1.4 > and 1.2.1 so I could have it nice and clean for 1.2.2. I extracted > the source and ran configure with these options: > > --prefix=/opt/openmpi/st --with-devel-headers --with-tm=/opt/torque > > > > I then build and installed it. But when I ran a program I got > these messages from each of my processes: > > : mca: base: component_find: unable to open pls tm: > File not found (ignored) > > : mca: base: component_find: unable to open ras tm: > File not found (ignored) > > > > This was the first time that Open MPI was configured with -with-tm > as torque wasn't installed previously. I found out that torque was > not installed to /opt/torque as it was supposed to be, but to it's > default location. So I reran the configure without --with-tm and > rebuilt and reinstalled (after uninstalling the previous build). > But I still got the same messages. > > > > So I completely deleted the source directory and destination > directory, extract the source, ran configure, rebuild and > installed. But still the same errors. According to the open mpi > faq, support for torque must be explicitly added via configure. > (http://www.open-mpi.org/faq/?category=building#build-rte-tm) So > is it still including it somehow? > > > > Thanks, > > Matt > > > > __ > Matt Cupp > Battelle Memorial Institute > Statistics and Information Analysis > > > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] v1.2.2 mca base unable to open pls/ras tm
On Jun 8, 2007, at 2:06 PM, Cupp, Matthew R wrote: Yes. But the /opt/torque directory is just the source, not the actual installed directory. The actual installed directory on the head node is the default location of /usr/lib/something. And that is not accessable by every node. But should it matter if it's not accessable if I don't specify --with-tm? I was wondering if ./configure detects torque has been installed, and then builds the associated components under the assumption that it's available. This is what OMPI does. However, if you only have static libraries for Torque, the issue should be moot -- the relevant bits should be statically linked into the OMPI tm plugins. But if your Torque libraries are shared, then you do need to have them available on all nodes for OMPI to be able to leverage native Torque/TM support. Make sense? -- Jeff Squyres Cisco Systems
Re: [OMPI users] mpirun in openmpi-1.2.2 fails to exit after client program finishes
On Jun 8, 2007, at 9:29 AM, Code Master wrote: I compiled openmpi-1.2.2 with: ./configure CFLAGS=-g -pg -O3 --prefix=/home/foo/490_research/490/ src/mpi.optimized_profiling/ \ --enable-mpi-threads --enable-progress-threads --enable-static -- disable-shared --without-memory-manager \ --without-libnuma --disable-mpi-f77 --disable-mpi-f90 --disable-mpi- cxx --disable-mpi-cxx-seek --disable-dlopen (Thanks Jeff, now I know that I have to add --without-memory- manager and --without-libnuma for static linking) Good. make all make install then I run my client app with: /home/foo/490_research/490/src/mpi.optimized_profiling/bin/mpirun -- hostfile ../hostfile -n 32 raytrace -finputs/car.env The program runs well and each process completes succssfully (I can tell because all processes have now generated gmon.out successfully and a "ps aux" on other slave nodes (except the originating node) show that my program in slave nodes have already exited (not existant). Therefore I think this may have something to do with mpirun,which hangs forever. Be aware that you may have problems with multiple processes writing to the same gmon.out, unless you're running each instance in a different directory (your command line doesn't indicate that you are, but that doesn't necessarily prove anything). Can you see anything wrong in my ./configure command which explains the mpirun hang at the end of the run? How can I fix it? No, everything looks fine. So you confirm that all raytrace instances have exited and all orteds have exited, leaving *only* mpirun runnning? There was a race condition about this at one point; Ralph -- can you comment further? -- Jeff Squyres Cisco Systems
Re: [OMPI users] Communication Latency
The answer is "it depends"; there's a lot of factors involved. - What is the topology of your network? - Where do processes land within the topology of the network? - What interconnect are you using? (e.g., the openib BTL will usually use short message RDMA to a limited set of peers as an optimization) - How long are your messages? OMPI does not have any special optimizations for point-to-point communications for MPI_COMM_WORLD ranks that happen to be powers of two. Other factors may contribute to make that true for your runs, but there's nothing hard-coded in Open MPI for that. On Jun 5, 2007, at 1:10 PM, Andy Georgi wrote: hi everybody, i'm new on this list and started using OpenMPI for my parallel jobs. first step was to measure the latency for blocking communication functions. now my first question: is it possible that ordained communication pairs will be optimized? background: latency for special processnumbers is nearly 25% smaller, e.g. for process 1,2,4,8,16,32,64... (every computer scientist should see the pattern ;-)). it doesn't matter from which process i send the message if the receiver is one of these processes i have top latency values. it's not possible that this effect comes through the network because communication from proc5 to proc32 e.g. is faster than communication from proc32 to proc5. i've tried it with OpenMPI for Intel 1.1.4 and 1.2.2 and OpenMPI for PGI 1.2.2. always the same results. now i think it must be a kind of optimization. if it's so i would like to know it because then i have an explanation ;-). thx and regards, andy ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres Cisco Systems
Re: [OMPI users] Issues running a basic program with spawn
On Jun 5, 2007, at 10:27 AM, Prakash Velayutham wrote: I know. I could not start another client code before this. So just wanted to check if /bin/hostname works with the spawn. It will not. MPI_COMM_SPAWN assumes that you are spawning an MPI application and therefore after the process is launched, it tries to do MPI-level coordination with it to setup new communicators, etc. FWIW: MPI-2 says that you are *only* allowed to launch MPI processes through MPI_COMM_SPAWN[_MULTIPLE]. This could well be the error that you are seeing (I haven't tried it myself to see what would happen). -- Jeff Squyres Cisco Systems
Re: [OMPI users] v1.2.2 mca base unable to open pls/ras tm
So I either have to uninstall torque, make the shared libraries available on all nodes, or have torque as static libraries on the head node? __ Matt Cupp Battelle Memorial Institute Statistics and Information Analysis -Original Message- From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf Of Jeff Squyres Sent: Friday, June 08, 2007 2:21 PM To: Open MPI Users Subject: Re: [OMPI users] v1.2.2 mca base unable to open pls/ras tm On Jun 8, 2007, at 2:06 PM, Cupp, Matthew R wrote: > Yes. But the /opt/torque directory is just the source, not the actual > installed directory. The actual installed directory on the head > node is > the default location of /usr/lib/something. And that is not > accessable > by every node. > > But should it matter if it's not accessable if I don't specify > --with-tm? I was wondering if ./configure detects torque has been > installed, and then builds the associated components under the > assumption that it's available. This is what OMPI does. However, if you only have static libraries for Torque, the issue should be moot -- the relevant bits should be statically linked into the OMPI tm plugins. But if your Torque libraries are shared, then you do need to have them available on all nodes for OMPI to be able to leverage native Torque/TM support. Make sense? -- Jeff Squyres Cisco Systems ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] v1.2.2 mca base unable to open pls/ras tm
Or tell Open MPI not to build torque support, which can be done at configure time with the --without-tm option. Open MPI tries to build support for whatever it finds in the default search paths, plus whatever things you specify the location of. Most of the time, this is what the user wants. In this case, however, it's not what you wanted so you'll have to add the --without-tm option. Hope this helps, Brian On Jun 8, 2007, at 1:08 PM, Cupp, Matthew R wrote: So I either have to uninstall torque, make the shared libraries available on all nodes, or have torque as static libraries on the head node? __ Matt Cupp Battelle Memorial Institute Statistics and Information Analysis -Original Message- From: users-boun...@open-mpi.org [mailto:users-bounces@open- mpi.org] On Behalf Of Jeff Squyres Sent: Friday, June 08, 2007 2:21 PM To: Open MPI Users Subject: Re: [OMPI users] v1.2.2 mca base unable to open pls/ras tm On Jun 8, 2007, at 2:06 PM, Cupp, Matthew R wrote: Yes. But the /opt/torque directory is just the source, not the actual installed directory. The actual installed directory on the head node is the default location of /usr/lib/something. And that is not accessable by every node. But should it matter if it's not accessable if I don't specify --with-tm? I was wondering if ./configure detects torque has been installed, and then builds the associated components under the assumption that it's available. This is what OMPI does. However, if you only have static libraries for Torque, the issue should be moot -- the relevant bits should be statically linked into the OMPI tm plugins. But if your Torque libraries are shared, then you do need to have them available on all nodes for OMPI to be able to leverage native Torque/TM support. Make sense? -- Jeff Squyres Cisco Systems ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] Issues running a basic program with spawn
My apologies - Prakash and I solved this off-list. I should have posted the final solution here too so any interested parties would know the answer. The problem actually is a bug that broke comm_spawn in 1.2.2 and may well be present in the entire 1.2 code series (I have not checked the prior sub-releases). I provided a patch to Prakash that solves the problem, and have requested that a slightly different version be released as part of 1.2.3. Sorry for forgetting to post this back to the list. Anyone needing the patch for 1.2.2 prior to the next sub-release should just let me know and I'll provide it. Ralph On 6/8/07 12:39 PM, "Jeff Squyres" wrote: > On Jun 5, 2007, at 10:27 AM, Prakash Velayutham wrote: > >> I know. I could not start another client code before this. So just >> wanted to check if /bin/hostname works with the spawn. > > It will not. MPI_COMM_SPAWN assumes that you are spawning an MPI > application and therefore after the process is launched, it tries to > do MPI-level coordination with it to setup new communicators, etc. > FWIW: MPI-2 says that you are *only* allowed to launch MPI processes > through MPI_COMM_SPAWN[_MULTIPLE]. > > This could well be the error that you are seeing (I haven't tried it > myself to see what would happen).
Re: [OMPI users] mixing MX and TCP
A fix for this problem is now available on the trunk. Please use any revision after 14963 and your problem will vanish [I hope!]. There are now some additional parameters which allow you to select which Myrinet network you want to use in the case there are several available (--mca btl_mx_if_include and --mca btl_mx_if_exclude). Even multi-rails should now work over MX. george. On May 31, 2007, at 12:09 PM, Kees Verstoep wrote: Hi, I am currently experimenting with OpenMPI in a multi-cluster setting where each cluster has its private Myri-10G/MX network besides TCP. Somehow I was under the assumption that OpenMPI would dynamically find out the details of this configuration, and use MX where possible (i.e., intra cluster), and TCP elsewhere. But from some initial testing it appears OpenMPI-1.2.1 assumes global connectivity over MX when every particpating host supports MX. I see MX rather than tcp-level connections between clusters being tried, which across clusters fails in mx_connect/mx_isend (at the moment there is no inter-cluster support in MX itself). Besides "mx", I do include "tcp" in the network option lists of course. Is this just something that is not yet supported in the current release, or does it work by providing some extra parameters? I have not started digging in the code yet. Thanks! Kees Verstoep ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users smime.p7s Description: S/MIME cryptographic signature
Re: [OMPI users] mpirun in openmpi-1.2.2 fails to exit after client program finishes
On 6/9/07, Jeff Squyres wrote: On Jun 8, 2007, at 9:29 AM, Code Master wrote: > I compiled openmpi-1.2.2 with: > > ./configure CFLAGS=-g -pg -O3 --prefix=/home/foo/490_research/490/ > src/mpi.optimized_profiling/ \ > --enable-mpi-threads --enable-progress-threads --enable-static -- > disable-shared --without-memory-manager \ > --without-libnuma --disable-mpi-f77 --disable-mpi-f90 --disable-mpi- > cxx --disable-mpi-cxx-seek --disable-dlopen > > (Thanks Jeff, now I know that I have to add --without-memory- > manager and --without-libnuma for static linking) Good. > make all > make install > > then I run my client app with: > > /home/foo/490_research/490/src/mpi.optimized_profiling/bin/mpirun -- > hostfile ../hostfile -n 32 raytrace -finputs/car.env > > The program runs well and each process completes succssfully (I can > tell because all processes have now generated gmon.out successfully > and a "ps aux" on other slave nodes (except the originating node) > show that my program in slave nodes have already exited (not > existant). Therefore I think this may have something to do with > mpirun,which hangs forever. Be aware that you may have problems with multiple processes writing to the same gmon.out, unless you're running each instance in a different directory (your command line doesn't indicate that you are, but that doesn't necessarily prove anything). I am sure this is not happening, because in my program, after the MPI initialization, the main() invokes chdir() which immediately change the directory to the process's own directory (named after the proc_id). Therefore they all have their own directory to write to. > Can you see anything wrong in my ./configure command which explains > the mpirun hang at the end of the run? How can I fix it? No, everything looks fine. So you confirm that all raytrace instances have exited and all orteds have exited, leaving *only* mpirun runnning? Yes, I am sure that all raytrace instances as well as all mpi-related processes (including mpirun and orteds etc.) have exited in all slave nodes. In the *master* node, all raytrace instances and all orteds have exited as well, leaving *only* mpirun running in the *master* node. 14818 pts/0S+ 0:00 /home/foo/490_research/490/src/mpi.optimized_profiling/bin/mpirun --hostfile ../hostfile -n 32 raytrace -finputs/car.env -s 1 There was a race condition about this at one point; Ralph -- can you comment further? -- Jeff Squyres Cisco Systems ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users