Re: [OMPI users] Network connection check
> > Are you asking to find out this information before issuing "mpirun"? Open > MPI does assume that the nodes you are trying to use are reachable. > > NO, Scenario is a pair of processes are running one in "master" node say "masterprocess" and one in "slave" node say "slaveprocess". When "masterprocess" needs service of slave process, it sends message to "slaveprocess" and "slaveprocess" serves its request. In case of Network failure(by any means) "masterprocess" will keep trying to send message to "slaveprocess" without knowing that it is not reachable. So how "masterprocess" should finds out that "slaveprocess" can't be reached and leave attempting to send messages till Connection is not up. Thanks & Regards, -- Vipin K. Research Engineer, C-DOTB, India
Re: [OMPI users] ifort and gfortran module
Hi Martin in your following solution I have a question: in step2. move the Fortran module to the directory ... what is "Fortran module" in step3. we don't need to install openmpi? thanks - Original Message - From: "Martin Siegert" To: "Open MPI Users" Sent: Monday, July 20, 2009 1:47:35 PM (GMT-0500) America/New_York Subject: Re: [OMPI users] ifort and gfortran module Hi, I want to avoid separate MPI distributions since we compile many MPI software packages. Having more than one MPI distribution (at least) doubles the amount of work. For now I came up with the following solution: 1. compile openmpi using gfortran as the Fortran compiler and install it in /usr/local/openmpi 2. move the Fortran module to the directory /usr/local/openmpi/include/gfortran. In that directory create softlinks to the files in /usr/local/openmpi/include. 3. compile openmpi using ifort and install the Fortran module in /usr/local/openmpi/include. 4. in /usr/local/openmpi/bin create softlinks mpif90.ifort and mpif90.gfortran pointing to opal_wrapper. Remove the mpif90 softlink. 5. Move /usr/local/openmpi/share/openmpi/mpif90-wrapper-data.txt to /usr/local/openmpi/share/openmpi/mpif90.ifort-wrapper-data.txt. Copy the file to /usr/local/openmpi/share/openmpi/mpif90.gfortran-wrapper-data.txt and change the line includedir=${includedir} to includedir=${includedir}/gfortran 6. Create a wrapper script /usr/local/openmpi/bin/mpif90: #!/bin/bash OMPI_WRAPPER_FC=`basename $OMPI_FC 2> /dev/null` if [ "$OMPI_WRAPPER_FC" = 'gfortran' ]; then exec $0.gfortran "$@" else exec $0.ifort "$@" fi
Re: [OMPI users] Network connection check
It depends on which network fails. If you lose all TCP connectivity, Open MPI should abort the job as the out-of-band system will detect the loss of connection. If you only lose the MPI connection (whether TCP or some other interconnect), then I believe the system will eventually generate an error after it retries sending the message a specified number of times, though it may not abort. On Jul 22, 2009, at 10:55 PM, vipin kumar wrote: Are you asking to find out this information before issuing "mpirun"? Open MPI does assume that the nodes you are trying to use are reachable. NO, Scenario is a pair of processes are running one in "master" node say "masterprocess" and one in "slave" node say "slaveprocess". When "masterprocess" needs service of slave process, it sends message to "slaveprocess" and "slaveprocess" serves its request. In case of Network failure(by any means) "masterprocess" will keep trying to send message to "slaveprocess" without knowing that it is not reachable. So how "masterprocess" should finds out that "slaveprocess" can't be reached and leave attempting to send messages till Connection is not up. Thanks & Regards, -- Vipin K. Research Engineer, C-DOTB, India ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] [Open MPI Announce] Open MPI v1.3.3 released
Jeff Squyres writes: > The MPI ABI has not changed since 1.3.2. Good, thanks. I hadn't had time to investigate the items in the release notes that looked suspicious. Are there actually any known ABI incompatibilities between 1.3.0 and 1.3.2? We haven't noticed any as far as I know. > Note that our internal API's are *not* guaranteed to be ABI compatible > between releases Sure. Thanks for clarifying. I assumed there was a missing negative in the previous answer about it, but it's worth spelling out.
Re: [OMPI users] ifort and gfortran module
Jeff Squyres writes: > See https://svn.open-mpi.org/source/xref/ompi_1.3/README#257. Ah, neat. I'd never thought of that, possibly due to ELF not being relevant when I first started worrying about that sort of thing. > Indeed. In OMPI, we tried to make this as simple as possible. But > unless you use specific compiler options to hide their differences, it > isn't possible and is beyond our purview to fix. :-( Sure. It was a question of whether it's just the interface, in which case flags may help with Fortran. > (similar situation with the C++ bindings) I'd have expected it to be worse, since compilers intentionally have inconsistent name-mangling as I understand it, but I'm not clever enough to understand C++ anyway :-/.
Re: [OMPI users] ifort and gfortran module
Jeff Squyres writes: > I *think* that there are compiler flags that you can use with ifort to > make it behave similarly to gfortran in terms of sizes and constant > values, etc. At a slight tangent, if there are flags that might be helpful to add to gfortran for compatibility (e.g. logical constants), I might be able to do it, though I've not been involved since g77 and haven't had much truck with such interface issues for a while. Does anyone know of any relevant incompatibilities that aren't covered by items in the README?
Re: [OMPI users] Network connection check
On Thu, Jul 23, 2009 at 3:03 PM, Ralph Castain wrote: > It depends on which network fails. If you lose all TCP connectivity, Open > MPI should abort the job as the out-of-band system will detect the loss of > connection. If you only lose the MPI connection (whether TCP or some other > interconnect), then I believe the system will eventually generate an error > after it retries sending the message a specified number of times, though it > may not abort. > > Thank you Ralph, >From your reply I came to know that the question I posted earlier was not reflecting the problem properly. I can't use blocking communication routines in my main program ( "masterprocess") because any type of network failure( may be due to physical connectivity or TCP connectivity or MPI connection as you told) may occur. So I am using non blocking point to point communication routines, and TEST later for completion of that Request. Once I enter a TEST loop I will test for Request complition till TIMEOUT. Suppose TIMEOUT has occured, In this case first I will check whether 1: Slave machine is reachable or not, (How I will do that ??? Given - I have IP address and Host Name of Slave machine.) 2: if reachable, check whether program(orted and "slaveprocess") is alive or not. I don't want to abort my master process in case 1 and hope that network connection will come up in future. Fortunately OpenMPI doesn't abort any process. Both processes can run independently without communicating. Thanks and Regards, -- Vipin K. Research Engineer, C-DOTB, India
Re: [OMPI users] [Open MPI Announce] Open MPI v1.3.3 released
On Jul 23, 2009, at 6:39 AM, Dave Love wrote: > The MPI ABI has not changed since 1.3.2. Good, thanks. I hadn't had time to investigate the items in the release notes that looked suspicious. Are there actually any known ABI incompatibilities between 1.3.0 and 1.3.2? We haven't noticed any as far as I know. It *might* work? To be honest, I would be surprised, though -- it may fail in subtle, non-obvious ways (i.e., during execution, not startup/ linking). We made some changes in 1.3.2 in order to freeze the ABI for the future that *probably* have disruptive effects in seamlessly working with prior versions (there were some strange technical issues involving OMPI's use of pointers for MPI handles -- I can explain more if you care). FWIW: the changes we made were in the back-end/internals of libmpi; source-code compatibility has been maintained since MPI-1.0 (aside from a handful of bugs in the MPI API that we have fixed over time -- e.g., a wrong parameter type in an MPI API function, etc.). -- Jeff Squyres jsquy...@cisco.com
Re: [OMPI users] ifort and gfortran module
FWIW, for the Fortran MPI programmers out there, the MPI Forum is hard at work on a new Fortran 03 set of bindings for MPI-3. We have a prototype in a side branch of Open MPI that is "mostly" working. We (the MPI Forum) expect to release a short document describing the new features and the prototype Open MPI implementation for larger Fortran community comment within a few months. On Jul 23, 2009, at 7:03 AM, Dave Love wrote: Jeff Squyres writes: > See https://svn.open-mpi.org/source/xref/ompi_1.3/README#257. Ah, neat. I'd never thought of that, possibly due to ELF not being relevant when I first started worrying about that sort of thing. > Indeed. In OMPI, we tried to make this as simple as possible. But > unless you use specific compiler options to hide their differences, it > isn't possible and is beyond our purview to fix. :-( Sure. It was a question of whether it's just the interface, in which case flags may help with Fortran. > (similar situation with the C++ bindings) I'd have expected it to be worse, since compilers intentionally have inconsistent name-mangling as I understand it, but I'm not clever enough to understand C++ anyway :-/. ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres jsquy...@cisco.com
Re: [OMPI users] Network connection check
On Jul 23, 2009, at 7:36 AM, vipin kumar wrote: I can't use blocking communication routines in my main program ( "masterprocess") because any type of network failure( may be due to physical connectivity or TCP connectivity or MPI connection as you told) may occur. So I am using non blocking point to point communication routines, and TEST later for completion of that Request. Once I enter a TEST loop I will test for Request complition till TIMEOUT. Suppose TIMEOUT has occured, In this case first I will check whether Open MPI should return a failure if TCP connectivity is lost, even with a non-blocking point-to-point operation. The failure should be returned in the call to MPI_TEST (and friends). So I'm not sure your timeout has meaning here -- if you reach the timeout, I think it simply means that the MPI communication has not completed yet. It does not necessarily mean that the MPI communication has failed. 1: Slave machine is reachable or not, (How I will do that ??? Given - I have IP address and Host Name of Slave machine.) 2: if reachable, check whether program(orted and "slaveprocess") is alive or not. MPI doesn't provide any standard way to check reachability and/or health of a peer process. That being said, I think some of the academics are working on more fault tolerant / resilient MPI messaging, but I don't know if they're ready to talk about such efforts publicly yet. -- Jeff Squyres jsquy...@cisco.com
Re: [OMPI users] Tuned collectives: How to choose them dynamically? (-mca coll_tuned_dynamic_rules_filename dyn_rules)"
Hi Gus, I played with collectives a few months ago. Details are here http://www.cse.scitech.ac.uk/disco/publications/WorkingNotes.ConnectX.pdf That was in the context of 1.2.6 You can get available tuning options by doing ompi_info -all -mca coll_tuned_use_dynamic_rules 1 | grep alltoall and similarly for other collectives. Best, Igor 2009/7/23 Gus Correa : > Dear OpenMPI experts > > I would like to experiment with the OpenMPI tuned collectives, > hoping to improve the performance of some programs we run > in production mode. > > However, I could not find any documentation on how to select the > different collective algorithms and other parameters. > In particular, I would love to read an explanation clarifying > the syntax and meaning of the lines on "dyn_rules" > file that is passed to > "-mca coll_tuned_dynamic_rules_filename ./dyn_rules" > > Recently there was an interesting discussion on the list > about this topic. It showed that choosing the right collective > algorithm can make a big difference in overall performance: > > http://www.open-mpi.org/community/lists/users/2009/05/9355.php > http://www.open-mpi.org/community/lists/users/2009/05/9399.php > http://www.open-mpi.org/community/lists/users/2009/05/9401.php > http://www.open-mpi.org/community/lists/users/2009/05/9419.php > > However, the thread was concentrated on "MPI_Alltoall". > Nothing was said about other collective functions. > Not much was said about the > "tuned collective dynamic rules" file syntax, > the meaning of its parameters, etc. > > Is there any source of information about that which I missed? > Thank you for any pointers or clarifications. > > Gus Correa > - > Gustavo Correa > Lamont-Doherty Earth Observatory - Columbia University > Palisades, NY, 10964-8000 - USA > - > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
Re: [OMPI users] Warning: declaration ‘struct MPI::Grequest_intercept_t’ does not declare anything
On Jul 22, 2009, at 3:17 AM, Alexey Sokolov wrote: from /home/user/NetBeansProjects/Correlation_orig/ Correlation/Correlation.cpp:2: /usr/include/openmpi/1.2.4-gcc/openmpi/ompi/mpi/cxx/ request_inln.h:347: warning: declaration ‘struct MPI::Grequest_intercept_t’ does not declare anything That's fairly odd, but if your program is not using the C++ bindings for MPI generalized requests, it won't matter. But as Jody noted, updating to Open MPI v1.3.3 is a better bet, anyway. Distro-default packages are great and convenient, but Open MPI releases at a faster pace than distros. It's annoying, but sometimes necessary to upgrade (especially if you're starting new and have no legacy reasons to stick with older software). FWIW: we slightly changed the routine that was issuing the warning to you in 1.3.3. Also, be aware that the MPI Forum is likely to deprecate the C++ bindings in MPI-2.2. They won't go away in MPI-2.2, but they may well go away in MPI-3. Open MPI (and others) will likely still include C++ binding functionality for a long time (to keep legacy codes still running), but they will become relegated to a minor subsystem. -- Jeff Squyres jsquy...@cisco.com
Re: [OMPI users] Network connection check
Maybe you could make a system call to ping the other machine. char sCommand[512]; // build the command string sprintf(sCommand, "ping -c %d -q %s > /dev/null", numPings, sHostName); // execute the command int iResult =system(sCommand); If the ping was successful, iResult will have the value 0. Jody On Thu, Jul 23, 2009 at 1:36 PM, vipin kumar wrote: > > > On Thu, Jul 23, 2009 at 3:03 PM, Ralph Castain wrote: >> >> It depends on which network fails. If you lose all TCP connectivity, Open >> MPI should abort the job as the out-of-band system will detect the loss of >> connection. If you only lose the MPI connection (whether TCP or some other >> interconnect), then I believe the system will eventually generate an error >> after it retries sending the message a specified number of times, though it >> may not abort. > > Thank you Ralph, > > From your reply I came to know that the question I posted earlier was not > reflecting the problem properly. > > I can't use blocking communication routines in my main program ( > "masterprocess") because any type of network failure( may be due to physical > connectivity or TCP connectivity or MPI connection as you told) may occur. > So I am using non blocking point to point communication routines, and TEST > later for completion of that Request. Once I enter a TEST loop I will test > for Request complition till TIMEOUT. Suppose TIMEOUT has occured, In this > case first I will check whether > > 1: Slave machine is reachable or not, (How I will do that ??? Given - I > have IP address and Host Name of Slave machine.) > > 2: if reachable, check whether program(orted and "slaveprocess") is alive > or not. > > I don't want to abort my master process in case 1 and hope that network > connection will come up in future. Fortunately OpenMPI doesn't abort any > process. Both processes can run independently without communicating. > > > Thanks and Regards, > -- > Vipin K. > Research Engineer, > C-DOTB, India > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
Re: [OMPI users] Network connection check
Jeff Squyres wrote: > On Jul 22, 2009, at 10:05 AM, vipin kumar wrote: > >> Actually requirement is how a C/C++ program running in "master" node >> should find out whether "slave" node is reachable (as we check this >> using "ping" command) or not ? Because IP address may change at any >> time, that's why I am trying to achieve this using "host name" of the >> "slave" node. How this can be done? > > > Are you asking to find out this information before issuing "mpirun"? > Open MPI does assume that the nodes you are trying to use are reachable. > How about you start your MPI program from a shell script that does the following: 1. Reads a text file containing the names of all the possible candidates for MPI nodes 2. Loops through the list of names from (1) and pings each machine to see if it's alive. If the host is pingable, then write it's name to a different text file which will be host as the machine file for the mpirun command 3. Call mpirun using the machine file generated in (2). -- Prentice
Re: [OMPI users] Network connection check
On Thu, 23 Jul 2009, vipin kumar wrote: 1: Slave machine is reachable or not, (How I will do that ??? Given - I have IP address and Host Name of Slave machine.) 2: if reachable, check whether program(orted and "slaveprocess") is alive or not. You don't specify and based on your description I infer that you are not using a batch/queueing system, but just a rsh/ssh based start-up mechanism. A batch/queueing system might be able to tell you whether a remote computer is still accessible. I think that MPI is not the proper mechanism to achieve what you want. PVM or, maybe better, direct socket programming will probably serve you more. -- Bogdan Costescu IWR, University of Heidelberg, INF 368, D-69120 Heidelberg, Germany Phone: +49 6221 54 8240, Fax: +49 6221 54 8850 E-mail: bogdan.coste...@iwr.uni-heidelberg.de
Re: [OMPI users] Network connection check
Thank you all Jeff, Jody, Prentice and Bogdan for your invaluable clarification, solution and suggestion, Open MPI should return a failure if TCP connectivity is lost, even with a > non-blocking point-to-point operation. The failure should be returned in > the call to MPI_TEST (and friends). even if MPI_TEST is a local operation? > So I'm not sure your timeout has meaning here -- if you reach the timeout, > I think it simply means that the MPI communication has not completed yet. > It does not necessarily mean that the MPI communication has failed. > you are absolutely correct., but the job should be done before it expires. that's the reason I am using TIMEOUT. So the conclusion is : > > MPI doesn't provide any standard way to check reachability and/or health > of a peer process. That's what I wanted to confirm. And to find out the solution, if any, or any alternative. So now I think, I should go for Jody's approach > > How about you start your MPI program from a shell script that does the > following: > > 1. Reads a text file containing the names of all the possible candidates > for MPI nodes > > 2. Loops through the list of names from (1) and pings each machine to > see if it's alive. If the host is pingable, then write it's name to a > different text file which will be host as the machine file for the > mpirun command > > > 3. Call mpirun using the machine file generated in (2). > I am assuming processes have been launched successfully. -- Vipin K. Research Engineer, C-DOTB, India
[OMPI users] Problem launching jobs in SGE (with loose integration), OpenMPI 1.3.3
I have built OpenMPI 1.3.3 without support for SGE. I just want to launch jobs with loose integration right now. Here is how I configured it: ./configure CC=pgcc CXX=pgCC F77=pgf90 F90=pgf90 FC=pgf90 --prefix=/opt/openmpi/1.3.3-pgi --without-sge --enable-io-romio --with-openib=/opt/hjet/ofed/1.4.1 --with-io-romio-flags=--with-file-system=lustre --enable-orterun-prefix-by-default I can start jobs from the commandline just fine. When I try to do the same thing inside an SGE job, I get errors like the following: error: executing task of job 5041155 failed: -- A daemon (pid 13324) died unexpectedly with status 1 while attempting to launch so we are aborting. There may be more information reported by the environment (see above). This may be because the daemon was unable to find all the needed shared libraries on the remote node. You may set your LD_LIBRARY_PATH to have the location of the shared libraries on the remote nodes and this will automatically be forwarded to the remote nodes. -- -- mpirun noticed that the job aborted, but has no info as to the process that caused that situation. -- mpirun: clean termination accomplished I am starting mpirun with the following options: $OMPI/bin/mpirun -mca btl openib,sm,self --mca pls ^sge \ -machinefile $MACHINE_FILE -x LD_LIBRARY_PATH -np 16 ./xhpl The options are to ensure I am using IB, that SGE is not used, and that the LD_LIBRARY_PATH is sent along to ensure dynamic linking is done correctly. This worked with 1.2.7 (except setting the pls option as gridengine instead of sge), but I can't get it to work with 1.3.3. Am I missing something obvious for getting jobs with loose integration started? Thanks, Craig
Re: [OMPI users] Network connection check
The 'system' command will fork a separate process to run. If I remember correctly, forking within MPI can lead to undefined behavior. Can someone in OpenMPI development team clarify? What I don't understand is: why is your TCP network so unstable that you are worried about reachability? For MPI to run, they should be connected on a local switch with a high bandwidth interconnect and not dispersed across the internet. Perhaps you should look at the underlying cause of network instability. If your network is actually stable, then your problem is only theoretical. Also, keep in mind that TCP itself offers a keepalive mechanism. Three parameters may be specified: the amount of inactivity after which the first probe is sent, the number of unanswered probes after which the connection is dropped and the interval between the probes. Typing 'sysctl -a' will print the entire IP MIB that has these names (I don't remember them off the top of my head). However, you say that you *don't* want to drop the connection, simply want to know about connectivity. What you can do, without causing 'undefined' MPI behaviour is to implement a similar mechanism in your MPI application. Durga On Thu, Jul 23, 2009 at 10:25 AM, vipin kumar wrote: > Thank you all Jeff, Jody, Prentice and Bogdan for your invaluable > clarification, solution and suggestion, > >> Open MPI should return a failure if TCP connectivity is lost, even with a >> non-blocking point-to-point operation. The failure should be returned in >> the call to MPI_TEST (and friends). > > even if MPI_TEST is a local operation? > >> >> So I'm not sure your timeout has meaning here -- if you reach the >> timeout, I think it simply means that the MPI communication has not >> completed yet. It does not necessarily mean that the MPI communication has >> failed. > > you are absolutely correct., but the job should be done before it expires. > that's the reason I am using TIMEOUT. > > So the conclusion is : >> >> MPI doesn't provide any standard way to check reachability and/or health >> of a peer process. > > That's what I wanted to confirm. And to find out the solution, if any, or > any alternative. > > So now I think, I should go for Jody's approach > >> >> How about you start your MPI program from a shell script that does the >> following: >> >> 1. Reads a text file containing the names of all the possible candidates >> for MPI nodes >> >> 2. Loops through the list of names from (1) and pings each machine to >> see if it's alive. If the host is pingable, then write it's name to a >> different text file which will be host as the machine file for the >> mpirun command > > >> >> 3. Call mpirun using the machine file generated in (2). > > I am assuming processes have been launched successfully. > > > > -- > Vipin K. > Research Engineer, > C-DOTB, India > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
Re: [OMPI users] Problem launching jobs in SGE (with loose integration), OpenMPI 1.3.3
I think what you are looking for is this: --mca plm_rsh_disable_qrsh 1 This means we will disable the use of qrsh and use rsh or ssh instead. The --mca pls ^sge does not work anymore for two reasons. First, the "pls" framework was renamed "plm". Secondly, the gridgengine plm was folded into the rsh/ssh one. A few more details at http://www.open-mpi.org/faq/?category=running#run-n1ge-or-sge Rolf On 07/23/09 10:34, Craig Tierney wrote: I have built OpenMPI 1.3.3 without support for SGE. I just want to launch jobs with loose integration right now. Here is how I configured it: ./configure CC=pgcc CXX=pgCC F77=pgf90 F90=pgf90 FC=pgf90 --prefix=/opt/openmpi/1.3.3-pgi --without-sge --enable-io-romio --with-openib=/opt/hjet/ofed/1.4.1 --with-io-romio-flags=--with-file-system=lustre --enable-orterun-prefix-by-default I can start jobs from the commandline just fine. When I try to do the same thing inside an SGE job, I get errors like the following: error: executing task of job 5041155 failed: -- A daemon (pid 13324) died unexpectedly with status 1 while attempting to launch so we are aborting. There may be more information reported by the environment (see above). This may be because the daemon was unable to find all the needed shared libraries on the remote node. You may set your LD_LIBRARY_PATH to have the location of the shared libraries on the remote nodes and this will automatically be forwarded to the remote nodes. -- -- mpirun noticed that the job aborted, but has no info as to the process that caused that situation. -- mpirun: clean termination accomplished I am starting mpirun with the following options: $OMPI/bin/mpirun -mca btl openib,sm,self --mca pls ^sge \ -machinefile $MACHINE_FILE -x LD_LIBRARY_PATH -np 16 ./xhpl The options are to ensure I am using IB, that SGE is not used, and that the LD_LIBRARY_PATH is sent along to ensure dynamic linking is done correctly. This worked with 1.2.7 (except setting the pls option as gridengine instead of sge), but I can't get it to work with 1.3.3. Am I missing something obvious for getting jobs with loose integration started? Thanks, Craig ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- = rolf.vandeva...@sun.com 781-442-3043 =
Re: [OMPI users] Network connection check
> You don't specify and based on your description I infer that you are not > using a batch/queueing system, but just a rsh/ssh based start-up mechanism. You are absolutely correct. I am using rsh/ssh based start-up mechanism. A batch/queueing system might be able to tell you whether a remote computer > is still accessible. > Right now I don't have any Idea about batch/queuing system, I will explore about that also. And I think you mean it before launching the jobs. > > I think that MPI is not the proper mechanism to achieve what you want. PVM > or, maybe better, direct socket programming will probably serve you more. I will think about these also. I have already spent significant amount of time in LAM-MPI and OPEN-MPI and due to lack of time I don't want to switch to another mechanism. Anyway Open MPI is doing great for me, Atleast 80% what I want. Thanks & Regards, -- Vipin K. Research Engineer, C-DOTB, India
Re: [OMPI users] Problem launching jobs in SGE (with loose integration), OpenMPI 1.3.3
Rolf Vandevaart wrote: > I think what you are looking for is this: > > --mca plm_rsh_disable_qrsh 1 > > This means we will disable the use of qrsh and use rsh or ssh instead. > > The --mca pls ^sge does not work anymore for two reasons. First, the > "pls" framework was renamed "plm". Secondly, the gridgengine plm was > folded into the rsh/ssh one. > Rolf, Thanks for the quick reply. That solved the problem. Craig > A few more details at > http://www.open-mpi.org/faq/?category=running#run-n1ge-or-sge > > Rolf > > On 07/23/09 10:34, Craig Tierney wrote: >> I have built OpenMPI 1.3.3 without support for SGE. >> I just want to launch jobs with loose integration right >> now. >> >> Here is how I configured it: >> >> ./configure CC=pgcc CXX=pgCC F77=pgf90 F90=pgf90 FC=pgf90 >> --prefix=/opt/openmpi/1.3.3-pgi --without-sge >> --enable-io-romio --with-openib=/opt/hjet/ofed/1.4.1 >> --with-io-romio-flags=--with-file-system=lustre >> --enable-orterun-prefix-by-default >> >> I can start jobs from the commandline just fine. When >> I try to do the same thing inside an SGE job, I get >> errors like the following: >> >> >> error: executing task of job 5041155 failed: >> -- >> >> A daemon (pid 13324) died unexpectedly with status 1 while attempting >> to launch so we are aborting. >> >> There may be more information reported by the environment (see above). >> >> This may be because the daemon was unable to find all the needed shared >> libraries on the remote node. You may set your LD_LIBRARY_PATH to have >> the >> location of the shared libraries on the remote nodes and this will >> automatically be forwarded to the remote nodes. >> -- >> >> -- >> >> mpirun noticed that the job aborted, but has no info as to the process >> that caused that situation. >> -- >> >> mpirun: clean termination accomplished >> >> >> I am starting mpirun with the following options: >> >> $OMPI/bin/mpirun -mca btl openib,sm,self --mca pls ^sge \ >> -machinefile $MACHINE_FILE -x LD_LIBRARY_PATH -np 16 ./xhpl >> >> The options are to ensure I am using IB, that SGE is not used, and that >> the LD_LIBRARY_PATH is sent along to ensure dynamic linking is done >> correctly. >> >> This worked with 1.2.7 (except setting the pls option as gridengine >> instead of sge), but I can't get it to work with 1.3.3. >> >> Am I missing something obvious for getting jobs with loose integration >> started? >> >> Thanks, >> Craig >> >> ___ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > -- Craig Tierney (craig.tier...@noaa.gov)
Re: [OMPI users] Profiling performance by forcing transport choice.
Nifty Tom Mitchell wrote: On Thu, Jun 25, 2009 at 08:37:21PM -0400, Jeff Squyres wrote: Subject: Re: [OMPI users] 50%performance reduction due to OpenMPI v 1.3.2forcing allMPI traffic over Ethernet instead of using Infiniband While the previous thread on "performance reduction" went left, right, forward and beyond the initial topic it tickled an idea for application profiling or characterizing. What if the various transports (btl) had knobs that permitted stepwise insertion of bandwidth limits and latency limits etc. so the application might be characterized better? I'm unclear what you're asking about. Are you asking that a BTL would limit the performance delivered to the application? E.g., the interconnect is capable of 1 Gbyte/sec, but you only deliver 100 Mbyte/sec (or whatever the user selects) to the app so the user can see whether bandwidth is a sensitive parameter for the app? If so, I have a few thoughts. 1) The actual limitations of an MPI implementation may hard to model. E.g., the amount of handshaking between processes, synchronization delays, etc. 2) For the most part, you could (actually even should) try doing this stuff much higher up than the BTLs. E.g., how about developing a PMPI layer that does what you're talking about. 3) I think folks have tried this sort of thing in the past by instrumenting the code and then "playing it back" or "simulating" with other performance parameters. E.g., "I run for X cycles, then I send a N-byte message, then compute another Y cycles, then post a receive, then ..." and then turn the knobs for latency, bandwidth, etc., to see at what point any of these become sensitive parameters. You might see: gosh, as long as latency is lower than about 30-70 usec, it really isn't important. Or, whatever. Off hand, I think different people have tried this approach and (without bothering to check my notes to see if my memory is any good) I think Dimemmas (associated with Paraver and CEPBA Barcelona) was one such tool. Most micro benchmarks are designed to measure various hardware characteristics but it is moderately hard to know what an application depends on. The value of this is that: *the application authors might learn something about their code that is hard to know at a well abstracted API level. *the purchasing decision maker would have the ability to access a well instrumented cluster and build a weighted value equation to help structure the decision. *the hardware vendor can learn what is valuable when deciding what feature and function needs the most attention/ transistors. i.e. it might be as valuable to benchmark "your code" on a single well instrumented platform as it might be to benchmark all the hardware you can get "yer hands on".
[OMPI users] TCP btl misbehaves if btl_tcp_port_min_v4 is not set.
Hello all, (this _might_ be related to https://svn.open-mpi.org/trac/ompi/ticket/1505) I just compiled and installed 1.3.3 ins a CentOS 5 environment and we noticed the processes would deadlock as soon as they would start using TCP communications. The test program is one that has been running on other clusters for years with no problems. Furthermore, using local cores doesn't deadlock the process whereas forcing inter-node communications (-bynode scheduling), immediately causes the problem. Symptoms: - processes don't crash or die, the use 100% CPU in system space (as opposed to user space) - stracing one of the processes will show it is freewheeling in a polling loop. - executing with --mca btl_base_verbose 30 will show weird port assignments, either they are wrong or should be interpreted as being an offset from the default btl_tcp_port_min_v4 (1024). - The error "mca_btl_tcp_endpoint_complete_connect] connect() to failed: No route to host (113)" _may_ be seen. We noticed it only showed up if we had vmnet interfaces up and running on certain nodes. Note that setting oob_tcp_listen_mode=listen_thread oob_tcp_if_include=eth0 btl_tcp_if_include=eth0 was one of our first reaction to this to no avail. Workaround we found: While keeping the above mentioned MCA parameters, we added btl_tcp_port_min_v4=2000 due to some firewall rules (which we had obviously disabled as part of the trouble shooting process) and noticed everything seemed to start working correctly from here on. This seems to work but I can find no logical explanation as the code seems to be clean in that respect. Some pasting for people searching frantically for a solution: [cluster-srv2:20379] btl: tcp: attempting to connect() to address 10.194.32.113 on port 2052 [cluster-srv2:20381] btl: tcp: attempting to connect() to address 10.194.32.113 on port 3076 [cluster-srv2:20377] btl: tcp: attempting to connect() to address 10.194.32.113 on port 260 [cluster-srv2:20383] btl: tcp: attempting to connect() to address 10.194.32.113 on port 3588 [cluster-srv1:19900] btl: tcp: attempting to connect() to address 10.194.32.117 on port 1540 [cluster-srv2:20377] btl: tcp: attempting to connect() to address 10.194.32.117 on port 2052 [cluster-srv2:20383] btl: tcp: attempting to connect() to address 10.194.32.117 on port 3076 [cluster-srv1:19894] btl: tcp: attempting to connect() to address 10.194.32.117 on port 516 [cluster-srv2:20379] btl: tcp: attempting to connect() to address 10.194.32.117 on port 3588 [cluster-srv1:19898] btl: tcp: attempting to connect() to address 10.194.32.117 on port 1028 [cluster-srv2:20381] btl: tcp: attempting to connect() to address 10.194.32.117 on port 2564 [cluster-srv1:19896] btl: tcp: attempting to connect() to address 10.194.32.117 on port 4 [cluster-srv3:13665] btl: tcp: attempting to connect() to address 10.194.32.115 on port 1028 [cluster-srv3:13663] btl: tcp: attempting to connect() to address 10.194.32.115 on port 4 [cluster-srv2][[44096,1],9][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_complete_connect] [cluster-srv2][[44096,1],13][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_complete_connect] connect() to 10.194.32.117 failed: No route to host (113) connect() to 10.194.32.117 failed: No route to host (113) [cluster-srv3][[44096,1],20][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_complete_connect] connect() to 10.194.32.115 failed: No route to host (113) Cheers! Eric Thiboedau
Re: [OMPI users] Open-MPI-1.3.2 compatibility with old torque?
Hi Ralph, Thanks for the fast reply! I put the --display-allocation and --display-map flags on and it looks like the nodes allocation is just fine, but the job still hang. The output looks like this: /home/kaisong/test node0001 node0001 node node Starting parallel job == ALLOCATED NODES == Data for node: Name: node0001 Num slots: 2Max slots: 0 Data for node: Name: node Num slots: 2Max slots: 0 = JOB MAP Data for node: Name: node0001 Num procs: 2 Process OMPI jobid: [16591,1] Process rank: 0 Process OMPI jobid: [16591,1] Process rank: 1 Data for node: Name: node Num procs: 2 Process OMPI jobid: [16591,1] Process rank: 2 Process OMPI jobid: [16591,1] Process rank: 3 = (no hello wrold output, job just hang here until timeout). And similar thing in the error output: node - daemon did not report back when launched Then, I ran the job manually by adding "-mca btl gm" flag for mpirun: /home/software/ompi/1.3.2-pgi/bin/mpirun -mca gm --display-allocation --display-map -v -machinefile ./node -np 4 ./hello-hostname MPI crashed with the following output/error: == ALLOCATED NODES == Data for node: Name: hbar.lbl.gov Num slots: 0Max slots: 0 Data for node: Name: node0045 Num slots: 4Max slots: 0 Data for node: Name: node0046 Num slots: 4Max slots: 0 Data for node: Name: node0047 Num slots: 4Max slots: 0 Data for node: Name: node0048 Num slots: 4Max slots: 0 = JOB MAP Data for node: Name: node0045 Num procs: 4 Process OMPI jobid: [62741,1] Process rank: 0 Process OMPI jobid: [62741,1] Process rank: 1 Process OMPI jobid: [62741,1] Process rank: 2 Process OMPI jobid: [62741,1] Process rank: 3 = -- At least one pair of MPI processes are unable to reach each other for MPI communications. This means that no Open MPI device has indicated that it can be used to communicate between these processes. This is an error; Open MPI requires that all MPI processes be able to reach each other. This error can sometimes be the result of forgetting to specify the "self" BTL. Process 1 ([[62741,1],1]) is on host: node0045 Process 2 ([[62741,1],1]) is on host: node0045 BTLs attempted: gm Your MPI job is now going to abort; sorry. -- -- -- -- It looks like MPI_INIT failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during MPI_INIT; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): PML add procs failed --> Returned "Unreachable" (-12) instead of "Success" (0) -- *** An error occurred in MPI_Init *** before MPI was initialized *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) [node0045:366] Abort before MPI_INIT completed successfully; not able to guarantee that all other process ! *** An error occurred in MPI_Init *** before MPI was initialized *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) [node0045:367] Abort before MPI_INIT completed successfully; not able to guarantee that all other process ! *** An error occurred in MPI_Init *** before MPI was initialized *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) [node0045:368] Abort before MPI_INIT completed successfully; not able to guarantee that all other process ! *** An error occurred in MPI_Init *** before MPI was initialized *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) [node0045:365] Abort before MPI_INIT completed successfully; not able to guarantee that all other process ! -- mpirun has exited due to process rank 3 with PID 368 on node node0045 exiting without calling "finalize". This may have caused other processes in the application to be terminated by signals sent by mpirun (as reported here). -- [hbar.lbl.gov:07770] 3 more processes have
Re: [OMPI users] Open-MPI-1.3.2 compatibility with old torque?
My apologies - I had missed that -mca btl flag. That is the source of the trouble. IIRC, GM doesn't have a loopback method in it. OMPI requires that -every- proc be able to reach -every- proc, including itself. So you must include the "self" btl at a minimum. Also, if you want more performance, you probably want to include the shared memory BTL as well. So the recommended param would be: -mca btl gm,sm,self Order doesn't matter. I'm disturbed that it would hang when you run in batch, though, instead of abort. Try with this new flag and see if it runs in both batch and interactive mode. HTH Ralph On Jul 23, 2009, at 1:10 PM, Song, Kai Song wrote: Hi Ralph, Thanks for the fast reply! I put the --display-allocation and -- display-map flags on and it looks like the nodes allocation is just fine, but the job still hang. The output looks like this: /home/kaisong/test node0001 node0001 node node Starting parallel job == ALLOCATED NODES == Data for node: Name: node0001 Num slots: 2Max slots: 0 Data for node: Name: node Num slots: 2Max slots: 0 = JOB MAP Data for node: Name: node0001 Num procs: 2 Process OMPI jobid: [16591,1] Process rank: 0 Process OMPI jobid: [16591,1] Process rank: 1 Data for node: Name: node Num procs: 2 Process OMPI jobid: [16591,1] Process rank: 2 Process OMPI jobid: [16591,1] Process rank: 3 = (no hello wrold output, job just hang here until timeout). And similar thing in the error output: node - daemon did not report back when launched Then, I ran the job manually by adding "-mca btl gm" flag for mpirun: /home/software/ompi/1.3.2-pgi/bin/mpirun -mca gm --display- allocation --display-map -v -machinefile ./node -np 4 ./hello-hostname MPI crashed with the following output/error: == ALLOCATED NODES == Data for node: Name: hbar.lbl.gov Num slots: 0Max slots: 0 Data for node: Name: node0045 Num slots: 4Max slots: 0 Data for node: Name: node0046 Num slots: 4Max slots: 0 Data for node: Name: node0047 Num slots: 4Max slots: 0 Data for node: Name: node0048 Num slots: 4Max slots: 0 = JOB MAP Data for node: Name: node0045 Num procs: 4 Process OMPI jobid: [62741,1] Process rank: 0 Process OMPI jobid: [62741,1] Process rank: 1 Process OMPI jobid: [62741,1] Process rank: 2 Process OMPI jobid: [62741,1] Process rank: 3 = -- At least one pair of MPI processes are unable to reach each other for MPI communications. This means that no Open MPI device has indicated that it can be used to communicate between these processes. This is an error; Open MPI requires that all MPI processes be able to reach each other. This error can sometimes be the result of forgetting to specify the "self" BTL. Process 1 ([[62741,1],1]) is on host: node0045 Process 2 ([[62741,1],1]) is on host: node0045 BTLs attempted: gm Your MPI job is now going to abort; sorry. -- -- -- -- It looks like MPI_INIT failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during MPI_INIT; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): PML add procs failed --> Returned "Unreachable" (-12) instead of "Success" (0) -- *** An error occurred in MPI_Init *** before MPI was initialized *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) [node0045:366] Abort before MPI_INIT completed successfully; not able to guarantee that all other process ! *** An error occurred in MPI_Init *** before MPI was initialized *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) [node0045:367] Abort before MPI_INIT completed successfully; not able to guarantee that all other process ! *** An error occurred in MPI_Init *** before MPI was initialized *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) [node0045:368] Abort before MPI_INIT completed successfully; not able to guarantee that all ot
Re: [OMPI users] Receiving an unknown number of messages
Eugene Loh wrote: Shaun Jackman wrote: For my MPI application, each process reads a file and for each line sends a message (MPI_Send) to one of the other processes determined by the contents of that line. Each process posts a single MPI_Irecv and uses MPI_Request_get_status to test for a received message. If a message has been received, it processes the message and posts a new MPI_Irecv. I believe this situation is not safe and prone to deadlock since MPI_Send may block. The receiver would need to post as many MPI_Irecv as messages it expects to receive, but it does not know in advance how many messages to expect from the other processes. How is this situation usually handled in an MPI appliation where the number of messages to receive is unknown? ... Each process posts an MPI_Irecv to listen for in-coming messages. Each process enters a loop in which it reads its file and sends out messages. Within this loop, you also loop on MPI_Test to see if any message has arrived. If so, process it, post another MPI_Irecv(), and keep polling. (I'd use MPI_Test rather than MPI_Request_get_status since you'll have to call something like MPI_Test anyhow to complete the receive.) Once you've posted all your sends, send out a special message to indicate you're finished. I'm thinking of some sort of tree fan-in/fan-out barrier so that everyone will know when everyone is finished. Keep polling on MPI_Test, processing further receives or advancing your fan-in/fan-out barrier. So, the key ingredients are: *) keep polling on MPI_Test and reposting MPI_Irecv calls to drain in-coming messages while you're still in your "send" phase *) have another mechanism for processes to notify one another when they've finished their send phases Hi Eugene, Very astute. You've pretty much exactly described how it works now, particularly the loop around MPI_Test and MPI_Irecv to drain incoming messages. So, here's my worry, which I'll demonstrate with an example. We have four processes. Each calls MPI_Irecv once. Each reads one line of its file. Each sends one message with MPI_Send to some other process based on the line that it has read, and then goes into the MPI_Test/MPI_Irecv loop. The events fall out in this order 2 sends to 0 and does not block (0 has one MPI_Irecv posted) 3 sends to 1 and does not block (1 has one MPI_Irecv posted) 0 receives the message from 2, consuming its MPI_Irecv 1 receives the message from 3, consuming its MPI_Irecv 0 sends to 1 and blocks (1 has no more MPI_Irecv posted) 1 sends to 0 and blocks (0 has no more MPI_Irecv posted) and now processes 0 and 1 are deadlocked. When I say `receives' above, I mean that Open MPI has received the message and copied it into the buffer passed to the MPI_Irecv call, but the application hasn't yet called MPI_Test. The next step would be for all the processes to call MPI_Test, but 0 and 1 are already deadlocked. Cheers, Shaun
[OMPI users] Open MPI:Problem with 64-bit openMPI and intel compiler
I have an OpenMPI program compiled with a version of OpenMPI built using the ifort 10.1 compiler. I can compile and run this code with no problem, using the 32 bit version of ifort. And I can also submit batch jobs using torque with this 32-bit code. However, compiling the same code to produce a 64 bit executable produces a code that runs correctly only in the simplest cases. It does not run correctly when run under the torque batch queuing system, running for awhile and then giving a segmentation violation in s section of code that is fine in the 32 bit version. I have to run the mpi multinode jobs using our torque batch queuing system, but we do have the capability of running the jobs in an interactive batch environment. If I do a qsub -I -l nodes=1:x4gb I get an interactive session on the remote node assigned to my job. I can run the job using either ./MPI_li_64 or mpirun -np 1 ./MPI_li_64 and the job runs successfully to completion. I can also start an interactive shell using qsub -I -l nodes=1:ppn=2:x4gb and I will get a single dual processor (or greater node). On this single node, mpirun -np 2 ./MPI_li_64 works. However, if instead I ask for two nodes in my interactive batch node, qsub -I -l nodes=2:x4gb, Two nodes will be assigned to me but when I enter mpirun -np 2 ./MPI_li_64 the job runs awhile, then fails with a mpirun noticed that process rank 1 with PID 23104 on node n339 exited on signal 11 (Segmentation fault). I can trace this in the intel debugger and see that the segmentation fault is occuring in what should be good code, and in code that executes with no problem when everything is compiled 32-bit. I am at a loss for what could be preventing this code to run within the batch queuing environment in the 64-bit version. Jim
[OMPI users] Interaction of MPI_Send and MPI_Barrier
Hi, Two processes run the following program: request = MPI_Irecv MPI_Send (to the other process) MPI_Barrier flag = MPI_Test(request) Without the barrier, there's a race and MPI_Test may or may not return true, indicating whether the message has been received. With the barrier, is it guaranteed that the message will have been received, and MPI_Test will return true? Cheers, Shaun
Re: [OMPI users] Receiving an unknown number of messages
Shaun Jackman wrote: Eugene Loh wrote: Shaun Jackman wrote: For my MPI application, each process reads a file and for each line sends a message (MPI_Send) to one of the other processes determined by the contents of that line. Each process posts a single MPI_Irecv and uses MPI_Request_get_status to test for a received message. If a message has been received, it processes the message and posts a new MPI_Irecv. I believe this situation is not safe and prone to deadlock since MPI_Send may block. The receiver would need to post as many MPI_Irecv as messages it expects to receive, but it does not know in advance how many messages to expect from the other processes. How is this situation usually handled in an MPI appliation where the number of messages to receive is unknown? ... Each process posts an MPI_Irecv to listen for in-coming messages. Each process enters a loop in which it reads its file and sends out messages. Within this loop, you also loop on MPI_Test to see if any message has arrived. If so, process it, post another MPI_Irecv(), and keep polling. (I'd use MPI_Test rather than MPI_Request_get_status since you'll have to call something like MPI_Test anyhow to complete the receive.) Once you've posted all your sends, send out a special message to indicate you're finished. I'm thinking of some sort of tree fan-in/fan-out barrier so that everyone will know when everyone is finished. Keep polling on MPI_Test, processing further receives or advancing your fan-in/fan-out barrier. So, the key ingredients are: *) keep polling on MPI_Test and reposting MPI_Irecv calls to drain in-coming messages while you're still in your "send" phase *) have another mechanism for processes to notify one another when they've finished their send phases Hi Eugene, Very astute. You've pretty much exactly described how it works now, particularly the loop around MPI_Test and MPI_Irecv to drain incoming messages. So, here's my worry, which I'll demonstrate with an example. We have four processes. Each calls MPI_Irecv once. Each reads one line of its file. Each sends one message with MPI_Send to some other process based on the line that it has read, and then goes into the MPI_Test/MPI_Irecv loop. The events fall out in this order 2 sends to 0 and does not block (0 has one MPI_Irecv posted) 3 sends to 1 and does not block (1 has one MPI_Irecv posted) 0 receives the message from 2, consuming its MPI_Irecv 1 receives the message from 3, consuming its MPI_Irecv 0 sends to 1 and blocks (1 has no more MPI_Irecv posted) 1 sends to 0 and blocks (0 has no more MPI_Irecv posted) and now processes 0 and 1 are deadlocked. When I say `receives' above, I mean that Open MPI has received the message and copied it into the buffer passed to the MPI_Irecv call, but the application hasn't yet called MPI_Test. The next step would be for all the processes to call MPI_Test, but 0 and 1 are already deadlocked. I don't get it. Processes should drain aggressively. So, if 0 receives a message, it should immediately post the next MPI_Irecv. Before 0 posts a send, it should MPI_Test (and post the next MPI_Irecv if the test received a message). Further, you could convert to MPI_Isend. But maybe I'm missing something.
Re: [OMPI users] Interaction of MPI_Send and MPI_Barrier
No - it is not guaranteed. (it is highly probable though) The return from the MPI_Send only guarantees that the data is safely held somewhere other than the send buffer so you are free to modify the send buffer. The MPI standard does not say where the data is to be held. It only says that once the MPI_Test is successful, the data will have been delivered to the receive buffer. Consider this possible scenario: MPI_Send is for a small message: The data is sent toward the destination To allow the MPI_Send to complete promptly ,lib MPI makes a temporary copy of the message The MPI_Send returns once the copy is made the message gets lost in the network the MPI_Barrier does its communication without packet loss and completes the call to MPI_Test returns false the send side gets no ack for the lost message and eventually retransmits it from the saved temp This time it gets through A later MPI_Test succeeds An ack eventually gets back to the sender and it throws away the temp copy of the message it was keeping in case a retransmit was needed I am not saying any particular MPI library would work this way but it is one kind of thing that a libmpi might do to give better performance while maintaining the strict rules of MPI semantic. Since the MPI_Barrier does not make any guarantee about the destination status of sends done before it, this kind of optimization is legitimate. If you must know that the message is received once the barrier returns, you need to MPI_Wait the message before you call barrier. Dick Dick Treumann - MPI Team IBM Systems & Technology Group Dept X2ZA / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601 Tele (845) 433-7846 Fax (845) 433-8363 users-boun...@open-mpi.org wrote on 07/23/2009 05:02:51 PM: > [image removed] > > [OMPI users] Interaction of MPI_Send and MPI_Barrier > > Shaun Jackman > > to: > > Open MPI > > 07/23/2009 05:04 PM > > Sent by: > > users-boun...@open-mpi.org > > Please respond to Open MPI Users > > Hi, > > Two processes run the following program: > > request = MPI_Irecv > MPI_Send (to the other process) > MPI_Barrier > flag = MPI_Test(request) > > Without the barrier, there's a race and MPI_Test may or may not return > true, indicating whether the message has been received. With the > barrier, is it guaranteed that the message will have been received, > and MPI_Test will return true? > > Cheers, > Shaun > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
[OMPI users] Backwards compatibility?
Is OpenMPI backwards compatible? I.e. If I am running 1.3.1 on one machine and 1.3.3 on the rest, is it supposed to work? Or do they all need exactly the same version? When I add this wrong version machine to the machinelist, with a simple "hello world from each process type program", I see no output what soever, even with the verbose flag - it just sits there indefinitely. Thanks, David
Re: [OMPI users] Open MPI:Problem with 64-bit openMPI and intel compiler
What OMPI version are you using? On Jul 23, 2009, at 3:00 PM, Sims, James S. Dr. wrote: I have an OpenMPI program compiled with a version of OpenMPI built using the ifort 10.1 compiler. I can compile and run this code with no problem, using the 32 bit version of ifort. And I can also submit batch jobs using torque with this 32-bit code. However, compiling the same code to produce a 64 bit executable produces a code that runs correctly only in the simplest cases. It does not run correctly when run under the torque batch queuing system, running for awhile and then giving a segmentation violation in s section of code that is fine in the 32 bit version. I have to run the mpi multinode jobs using our torque batch queuing system, but we do have the capability of running the jobs in an interactive batch environment. If I do a qsub -I -l nodes=1:x4gb I get an interactive session on the remote node assigned to my job. I can run the job using either ./MPI_li_64 or mpirun -np 1 ./MPI_li_64 and the job runs successfully to completion. I can also start an interactive shell using qsub -I -l nodes=1:ppn=2:x4gb and I will get a single dual processor (or greater node). On this single node, mpirun -np 2 ./MPI_li_64 works. However, if instead I ask for two nodes in my interactive batch node, qsub -I -l nodes=2:x4gb, Two nodes will be assigned to me but when I enter mpirun -np 2 ./MPI_li_64 the job runs awhile, then fails with a mpirun noticed that process rank 1 with PID 23104 on node n339 exited on signal 11 (Segmentation fault). I can trace this in the intel debugger and see that the segmentation fault is occuring in what should be good code, and in code that executes with no problem when everything is compiled 32-bit. I am at a loss for what could be preventing this code to run within the batch queuing environment in the 64-bit version. Jim ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] Backwards compatibility?
I doubt those two would work together - however, a combination of 1.3.2 and 1.3.3 should. You might look at the ABI compatibility discussion threads (there have been several) on this list for the reasons. Basically, binary compatibility is supported starting with 1.3.2 and above. On Jul 23, 2009, at 3:28 PM, David Doria wrote: Is OpenMPI backwards compatible? I.e. If I am running 1.3.1 on one machine and 1.3.3 on the rest, is it supposed to work? Or do they all need exactly the same version? When I add this wrong version machine to the machinelist, with a simple "hello world from each process type program", I see no output what soever, even with the verbose flag - it just sits there indefinitely. Thanks, David ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] Open-MPI-1.3.2 compatibility with old torque?
Hi Ralph, With the flag -mca btl gm,sm,self, runnig the job maually works and has a better performance as you said! However, it still gangs there when it goes through the PBS scheduler. Here is my PBS script: #!/bin/sh #PBS -l nodes=2:ppn=2 #PBS -l walltime=00:02:00 #PBS -k eo cd ~kaisong/test echo `pwd` cat $PBS_NODEFILE echo "Starting parallel job" /home/software/ompi/1.3.2-pgi/bin/mpirun -mca btl gm,self --display-allocation --display-map -d 8 -v -machinefile $PBS_NODEFILE -np 4 ./hello-hostname echo "ending parallel job" The error message and ouput file from torque are same as before. What other problems do you think it could be...? Please let me know if you need more information about our system. Thanks a lot for helping me along this far! I hope we are getting close to find out the real problem. Kai Kai Song 1.510.486.4894 High Performance Computing Services (HPCS) Intern Lawrence Berkeley National Laboratory - http://scs.lbl.gov - Original Message - From: Ralph Castain List-Post: users@lists.open-mpi.org Date: Thursday, July 23, 2009 1:06 pm Subject: Re: [OMPI users] Open-MPI-1.3.2 compatibility with old torque? To: "Song, Kai Song" Cc: Open MPI Users > My apologies - I had missed that -mca btl flag. That is the source > of > the trouble. IIRC, GM doesn't have a loopback method in it. OMPI > requires that -every- proc be able to reach -every- proc, including > > itself. > > So you must include the "self" btl at a minimum. Also, if you want > more performance, you probably want to include the shared memory > BTL > as well. > > So the recommended param would be: > > -mca btl gm,sm,self > > Order doesn't matter. I'm disturbed that it would hang when you run > in > batch, though, instead of abort. Try with this new flag and see if > it > runs in both batch and interactive mode. > > HTH > Ralph > > On Jul 23, 2009, at 1:10 PM, Song, Kai Song wrote: > > > Hi Ralph, > > > > Thanks for the fast reply! I put the --display-allocation and -- > > display-map flags on and it looks like the nodes allocation is > just > > fine, but the job still hang. > > > > The output looks like this: > > /home/kaisong/test > > node0001 > > node0001 > > node > > node > > Starting parallel job > > > > == ALLOCATED NODES == > > > > Data for node: Name: node0001 Num slots: 2Max slots: 0 > > Data for node: Name: node Num slots: 2Max slots: 0 > > > > = > > > > JOB MAP > > > > Data for node: Name: node0001 Num procs: 2 > > Process OMPI jobid: [16591,1] Process rank: 0 > > Process OMPI jobid: [16591,1] Process rank: 1 > > > > Data for node: Name: node Num procs: 2 > > Process OMPI jobid: [16591,1] Process rank: 2 > > Process OMPI jobid: [16591,1] Process rank: 3 > > > > = > > (no hello wrold output, job just hang here until timeout). > > And similar thing in the error output: > > node - daemon did not report back when launched > > > > > > Then, I ran the job manually by adding "-mca btl gm" flag for > mpirun:> /home/software/ompi/1.3.2-pgi/bin/mpirun -mca gm --display- > > > allocation --display-map -v -machinefile ./node -np 4 ./hello- > hostname> > > MPI crashed with the following output/error: > > == ALLOCATED NODES == > > > > Data for node: Name: hbar.lbl.gov Num slots: 0Max > slots: 0 > > Data for node: Name: node0045 Num slots: 4Max slots: 0 > > Data for node: Name: node0046 Num slots: 4Max slots: 0 > > Data for node: Name: node0047 Num slots: 4Max slots: 0 > > Data for node: Name: node0048 Num slots: 4Max slots: 0 > > > > = > > > > JOB MAP > > > > Data for node: Name: node0045 Num procs: 4 > > Process OMPI jobid: [62741,1] Process rank: 0 > > Process OMPI jobid: [62741,1] Process rank: 1 > > Process OMPI jobid: [62741,1] Process rank: 2 > > Process OMPI jobid: [62741,1] Process rank: 3 > > > > = > > -- > > > At least one pair of MPI processes are unable to reach each other > for> MPI communications. This means that no Open MPI device has > indicated> that it can be used to communicate between these > processes. This is > > an error; Open MPI requires that all MPI processes be able to reach > > each other. This error can sometimes be the result of forgetting to > > specify the "self" BTL. > > > > Process 1 ([[62741,1],1]) is on host: node0045 > > Process 2 ([[62741,1],1]) is on host: node0045 > >
Re: [OMPI users] Backwards compatibility?
On Thu, Jul 23, 2009 at 5:47 PM, Ralph Castain wrote: > I doubt those two would work together - however, a combination of 1.3.2 and > 1.3.3 should. > > You might look at the ABI compatibility discussion threads (there have been > several) on this list for the reasons. Basically, binary compatibility is > supported starting with 1.3.2 and above. Ok - I'll make sure to use all the same version. Is there anyway that can be detected and an error thrown? It took me quite a while to figure out that one machine was the wrong version. Thanks, David
Re: [OMPI users] Open MPI:Problem with 64-bit openMPI and intel compiler
[sims@raritan openmpi]$ mpirun -V mpirun (Open MPI) 1.3.1rc4 From: users-boun...@open-mpi.org [users-boun...@open-mpi.org] On Behalf Of Ralph Castain [r...@open-mpi.org] Sent: Thursday, July 23, 2009 5:44 PM To: Open MPI Users Subject: Re: [OMPI users] Open MPI:Problem with 64-bit openMPI and intel compiler What OMPI version are you using? On Jul 23, 2009, at 3:00 PM, Sims, James S. Dr. wrote: > I have an OpenMPI program compiled with a version of OpenMPI built > using the ifort 10.1 > compiler. I can compile and run this code with no problem, using the > 32 bit > version of ifort. And I can also submit batch jobs using torque with > this 32-bit code. > However, compiling the same code to produce a 64 bit executable > produces a code > that runs correctly only in the simplest cases. It does not run > correctly when run > under the torque batch queuing system, running for awhile and then > giving a > segmentation violation in s section of code that is fine in the 32 > bit version. > > I have to run the mpi multinode jobs using our torque batch queuing > system, > but we do have the capability of running the jobs in an interactive > batch environment. > > If I do a qsub -I -l nodes=1:x4gb > I get an interactive session on the remote node assigned to my job. > I can run the > job using either > ./MPI_li_64 or > mpirun -np 1 ./MPI_li_64 > and the job runs successfully to completion. I can also > start an interactive shell using > qsub -I -l nodes=1:ppn=2:x4gb > and I will get a single dual processor (or greater node). On this > single node, > mpirun -np 2 ./MPI_li_64 works. > However, if instead I ask for two nodes in my interactive batch node, > qsub -I -l nodes=2:x4gb, > Two nodes will be assigned to me but when I enter > mpirun -np 2 ./MPI_li_64 > the job runs awhile, then fails with a > mpirun noticed that process rank 1 with PID 23104 on node n339 > exited on signal 11 (Segmentation fault). > > I can trace this in the intel debugger and see that the segmentation > fault is occuring in what should > be good code, and in code that executes with no problem when > everything is compiled 32-bit. I am > at a loss for what could be preventing this code to run within the > batch queuing environment in the > 64-bit version. > > Jim > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] Open MPI:Problem with 64-bit openMPI and intel compiler
Okay - thanks! First, be assured we run 64-bit ifort code under Torque at large scale all the time here at LANL, so this is likely to be something trivial in your environment. A few things to consider/try: 1. most likely culprit is that your LD_LIBRARY_PATH is pointing to the 32-bit libraries on the other nodes. Torque does -not- copy your environment by default, and neither does OMPI. Try adding -x LD_LIBRARY_PATH to your cmd line, making sure that the 64-bit libs are before any 32- bit libs in that envar. This tells mpirun to pickup that envar and propagate it for you. 2. check to ensure you are in fact using a 64-bit version of OMPI. Run "ompi_info --config" to see how it was built. Also run "mpif90 -- showme" and see what libs it is linked to. Does your LD_LIBRARY_PATH match the names and ordering? 3. get a multi-node allocation and run "pbsdsh echo $LD_LIBRARY_PATH" and see what libs you are defaulting to on the other nodes. I realize these are somewhat overlapping, but I think you catch the drift - I suspect you are getting the infamous "library confusion" problem. HTH Ralph On Jul 23, 2009, at 7:49 PM, Sims, James S. Dr. wrote: [sims@raritan openmpi]$ mpirun -V mpirun (Open MPI) 1.3.1rc4 From: users-boun...@open-mpi.org [users-boun...@open-mpi.org] On Behalf Of Ralph Castain [r...@open-mpi.org] Sent: Thursday, July 23, 2009 5:44 PM To: Open MPI Users Subject: Re: [OMPI users] Open MPI:Problem with 64-bit openMPI and intelcompiler What OMPI version are you using? On Jul 23, 2009, at 3:00 PM, Sims, James S. Dr. wrote: I have an OpenMPI program compiled with a version of OpenMPI built using the ifort 10.1 compiler. I can compile and run this code with no problem, using the 32 bit version of ifort. And I can also submit batch jobs using torque with this 32-bit code. However, compiling the same code to produce a 64 bit executable produces a code that runs correctly only in the simplest cases. It does not run correctly when run under the torque batch queuing system, running for awhile and then giving a segmentation violation in s section of code that is fine in the 32 bit version. I have to run the mpi multinode jobs using our torque batch queuing system, but we do have the capability of running the jobs in an interactive batch environment. If I do a qsub -I -l nodes=1:x4gb I get an interactive session on the remote node assigned to my job. I can run the job using either ./MPI_li_64 or mpirun -np 1 ./MPI_li_64 and the job runs successfully to completion. I can also start an interactive shell using qsub -I -l nodes=1:ppn=2:x4gb and I will get a single dual processor (or greater node). On this single node, mpirun -np 2 ./MPI_li_64 works. However, if instead I ask for two nodes in my interactive batch node, qsub -I -l nodes=2:x4gb, Two nodes will be assigned to me but when I enter mpirun -np 2 ./MPI_li_64 the job runs awhile, then fails with a mpirun noticed that process rank 1 with PID 23104 on node n339 exited on signal 11 (Segmentation fault). I can trace this in the intel debugger and see that the segmentation fault is occuring in what should be good code, and in code that executes with no problem when everything is compiled 32-bit. I am at a loss for what could be preventing this code to run within the batch queuing environment in the 64-bit version. Jim ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users