Re: [OMPI users] SilverStorm IB
On Apr 12, 2006, at 8:59 PM, Jeff Squyres (jsquyres) wrote: FWIW, the "has a different size..." errors means that you may not have been linking against the shared libraries that you thought you were. This typically means that the executable expected to find an object in a library of a given size, but the actual size of the object was different. So some kind of mismatch was occurring, and the segv at the end was therefore not surprising. Yeah; I wasn't surprised either. That's why I just re-compiled the app & ran it. Then it worked. I'm suspicious (but can't prove it) that the opensm subnet manager (running on another node, and on the Mellanox 'ib gold' stack) wasn't working properly. The problem is that I have nothing to back up the suspicion. But the behavior was consistent to what I'd see if there was no subnet manager on the IB fabric (which may well have been the case, actually). It's working now, though... -- Troy Telford
Re: [OMPI users] Error while loading shared libraries
The error message is coming from all nodes. I explicitly add the path of Intel shared library to LD_LIBRARY_PATH on my mpiexec command, in addition to it being added in my shell startup file. I make a batch request to PBS. The Intel shared library is on a common file system across compute nodes. - Original Message - From: "Jeff Squyres (jsquyres)" To: "Open MPI Users" Sent: Wednesday, April 12, 2006 11:03 PM Subject: Re: [OMPI users] Error while loading shared libraries My mistake -- I missed the "orted" part of the error message. "orted" is a helper application that is intentally launched by Open MPI during mpirun. What is happening is that it is not able to find the Intel libraries, and is therefore failing to launch. So why is it not finding the Intel shared library? - is this error message coming from a remote node? - is your LD_LIBRARY_PATH set for all your remote nodes? For example, if you're using rsh or ssh to start processes (vs. a resource manager such as SLURM or Torque), you will need to ensure that your shell startup files on all the nodes sets LD_LIBRARY_PATH properly (i.e., it's not enough to "setenv LD_LIBRARY_PATH ...; mpirun ..." because the LD_LIBRARY_PATH value won't be set on all the nodes) - Is the Intel shared library available on all your nodes? (you didn't specify if the applications that you are able to run were on all your compute nodes or just on the node where you compiled them) -Original Message- From: Aniruddha Shet [mailto:s...@cse.ohio-state.edu] Sent: Wednesday, April 12, 2006 12:17 PM To: Open MPI Users Cc: Jeff Squyres (jsquyres) Subject: Re: [OMPI users] Error while loading shared libraries Hi, I am able to run non-OpenMPI MPI jobs where the MPI library is built on top of Intel compilers. Plus, non-MPI jobs built with Intel compilers run just fine. So, I am not sure how to go about fixing this. Thanks, Aniruddha - Original Message - From: "Jeff Squyres (jsquyres)" To: "Open MPI Users" Sent: Wednesday, April 12, 2006 8:30 AM Subject: Re: [OMPI users] Error while loading shared libraries > Greetings. > > Your logs look normal. > > The problem appears to be how you compiled / linked your final > executable. You said that you used -static -- I don't know offhand if > that is a supported flag for the intel compiler or not. Did you *link* > with -static, or just *compile* with it? > > Try running "ldd" on your executable -- it will show which shared > libraries your executable links against. > > I *think* that libcprts.so is a library internal to the Inter compiler > -- so even if icc supports "-static", this library may be exempted...? > (that's a total guess -- I'm not familiar with the internals of the > Intel compilers) If this is the case, you might try installing the > Intel compiler run-time libraries on all your nodes (this seems > unattractive, though). > > Regardless, I don't think that this is an MPI problem -- you might want > to try playing around with compiling some simple [non-MPI] "hello world" > applications with your Intel compiler to figure out how to compile > things statically. > > > > -Original Message- > > From: users-boun...@open-mpi.org > > [mailto:users-boun...@open-mpi.org] On Behalf Of Aniruddha Shet > > Sent: Monday, April 10, 2006 10:06 PM > > To: us...@open-mpi.org > > Subject: [OMPI users] Error while loading shared libraries > > > > Hi, > > > > I have built OpenMPI using ifort and icc Intel compilers > > with --enable-static --disable-shared options. I compile my job using > > OpenMPI wrapper compilers, additionally with -static option. > > When I run the > > job, I get the error 'orted:error while loading shared libraries: > > libcprts.so.5: cannot open shared object file: No such file > > or directory'. I > > also have the path of Intel compiler libraries in > > LD_LIBRARY_PATH. Please > > find attached a tar file having config.log and ompi_info output. > > > > Thanks, > > Aniruddha > > -- > > > > Aniruddha Shet | Project webpage: > > http://forge-fre.ornl.gov/molar/index.html > > Graduate Research Associate | Project webpage: www.cs.unm.edu/~fastos > > Dept. of Comp. Sci. & Engg | Personal webpage: > > www.cse.ohio-state.edu/~shet > > The Ohio State University | Office: DL 474 > > 2015 Neil Avenue | Phone: +1 (614) 292 7036 > > Columbus OH 43210-1277 | Cell: +1 (614) 446 1630 > > -- > > > > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > > ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] Error while loading shared libraries
If you are using PBS, the environment of where you ran "qsub" is automatically copied out to the first node in your job where your script is run. Can you send your torque job script? > -Original Message- > From: users-boun...@open-mpi.org > [mailto:users-boun...@open-mpi.org] On Behalf Of Aniruddha Shet > Sent: Thursday, April 13, 2006 12:13 AM > To: Open MPI Users > Subject: Re: [OMPI users] Error while loading shared libraries > > The error message is coming from all nodes. > > I explicitly add the path of Intel shared library to > LD_LIBRARY_PATH on my > mpiexec command, in addition to it being added in my shell > startup file. > > I make a batch request to PBS. The Intel shared library is on > a common file > system across compute nodes. > > - Original Message - > From: "Jeff Squyres (jsquyres)" > To: "Open MPI Users" > Sent: Wednesday, April 12, 2006 11:03 PM > Subject: Re: [OMPI users] Error while loading shared libraries > > > > My mistake -- I missed the "orted" part of the error message. > > > > "orted" is a helper application that is intentally launched > by Open MPI > > during mpirun. What is happening is that it is not able to find the > > Intel libraries, and is therefore failing to launch. > > > > So why is it not finding the Intel shared library? > > > > - is this error message coming from a remote node? > > - is your LD_LIBRARY_PATH set for all your remote nodes? > For example, > > if you're using rsh or ssh to start processes (vs. a > resource manager > > such as SLURM or Torque), you will need to ensure that your shell > > startup files on all the nodes sets LD_LIBRARY_PATH > properly (i.e., it's > > not enough to "setenv LD_LIBRARY_PATH ...; mpirun ..." because the > > LD_LIBRARY_PATH value won't be set on all the nodes) > > - Is the Intel shared library available on all your nodes? > (you didn't > > specify if the applications that you are able to run were > on all your > > compute nodes or just on the node where you compiled them) > > > > > >> -Original Message- > >> From: Aniruddha Shet [mailto:s...@cse.ohio-state.edu] > >> Sent: Wednesday, April 12, 2006 12:17 PM > >> To: Open MPI Users > >> Cc: Jeff Squyres (jsquyres) > >> Subject: Re: [OMPI users] Error while loading shared libraries > >> > >> Hi, > >> > >> I am able to run non-OpenMPI MPI jobs where the MPI library > >> is built on top > >> of Intel compilers. Plus, non-MPI jobs built with Intel > >> compilers run just > >> fine. So, I am not sure how to go about fixing this. > >> > >> Thanks, > >> Aniruddha > >> > >> - Original Message - > >> From: "Jeff Squyres (jsquyres)" > >> To: "Open MPI Users" > >> Sent: Wednesday, April 12, 2006 8:30 AM > >> Subject: Re: [OMPI users] Error while loading shared libraries > >> > >> > >> > Greetings. > >> > > >> > Your logs look normal. > >> > > >> > The problem appears to be how you compiled / linked your final > >> > executable. You said that you used -static -- I don't know > >> offhand if > >> > that is a supported flag for the intel compiler or not. > >> Did you *link* > >> > with -static, or just *compile* with it? > >> > > >> > Try running "ldd" on your executable -- it will show which shared > >> > libraries your executable links against. > >> > > >> > I *think* that libcprts.so is a library internal to the > >> Inter compiler > >> > -- so even if icc supports "-static", this library may be > >> exempted...? > >> > (that's a total guess -- I'm not familiar with the > internals of the > >> > Intel compilers) If this is the case, you might try > installing the > >> > Intel compiler run-time libraries on all your nodes (this seems > >> > unattractive, though). > >> > > >> > Regardless, I don't think that this is an MPI problem -- > >> you might want > >> > to try playing around with compiling some simple [non-MPI] > >> "hello world" > >> > applications with your Intel compiler to figure out how > to compile > >> > things statically. > >> > > >> > > >> > > -Original Message- > >> > > From: users-boun...@open-mpi.org > >> > > [mailto:users-boun...@open-mpi.org] On Behalf Of Aniruddha Shet > >> > > Sent: Monday, April 10, 2006 10:06 PM > >> > > To: us...@open-mpi.org > >> > > Subject: [OMPI users] Error while loading shared libraries > >> > > > >> > > Hi, > >> > > > >> > > I have built OpenMPI using ifort and icc Intel compilers > >> > > with --enable-static --disable-shared options. I compile > >> my job using > >> > > OpenMPI wrapper compilers, additionally with -static option. > >> > > When I run the > >> > > job, I get the error 'orted:error while loading shared > libraries: > >> > > libcprts.so.5: cannot open shared object file: No such file > >> > > or directory'. I > >> > > also have the path of Intel compiler libraries in > >> > > LD_LIBRARY_PATH. Please > >> > > find attached a tar file having config.log and > ompi_info output. > >> > > > >> > > Thanks, > >> > > Aniruddha > >> > > -
Re: [OMPI users] Error while loading shared libraries
#PBS -l walltime=0:01:00 #PBS -l nodes=4:ppn=2 #PBS -N aniruddha_job #PBS -S /bin/bash cd $HOME/NPB/NPB3.2/NPB3.2-MPI/bin/OMPI/EP/A/4_NO /home/osu4005/openmpi/openmpi_NO/bin/mpiexec --bynode --prefix /home/osu4005/openmpi/openmpi_NO --mca btl mvapi -n 4 LD_LIBRARY_PATH=/usr/local/intel-8.0-20040716/lib:$LD_LIBRARY_PATH ./ep.A.4 > results.ep.A.4 - Original Message - From: "Jeff Squyres (jsquyres)" To: "Open MPI Users" Sent: Thursday, April 13, 2006 7:42 AM Subject: Re: [OMPI users] Error while loading shared libraries If you are using PBS, the environment of where you ran "qsub" is automatically copied out to the first node in your job where your script is run. Can you send your torque job script? -Original Message- From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf Of Aniruddha Shet Sent: Thursday, April 13, 2006 12:13 AM To: Open MPI Users Subject: Re: [OMPI users] Error while loading shared libraries The error message is coming from all nodes. I explicitly add the path of Intel shared library to LD_LIBRARY_PATH on my mpiexec command, in addition to it being added in my shell startup file. I make a batch request to PBS. The Intel shared library is on a common file system across compute nodes. - Original Message - From: "Jeff Squyres (jsquyres)" To: "Open MPI Users" Sent: Wednesday, April 12, 2006 11:03 PM Subject: Re: [OMPI users] Error while loading shared libraries > My mistake -- I missed the "orted" part of the error message. > > "orted" is a helper application that is intentally launched by Open MPI > during mpirun. What is happening is that it is not able to find the > Intel libraries, and is therefore failing to launch. > > So why is it not finding the Intel shared library? > > - is this error message coming from a remote node? > - is your LD_LIBRARY_PATH set for all your remote nodes? For example, > if you're using rsh or ssh to start processes (vs. a resource manager > such as SLURM or Torque), you will need to ensure that your shell > startup files on all the nodes sets LD_LIBRARY_PATH properly (i.e., it's > not enough to "setenv LD_LIBRARY_PATH ...; mpirun ..." because the > LD_LIBRARY_PATH value won't be set on all the nodes) > - Is the Intel shared library available on all your nodes? (you didn't > specify if the applications that you are able to run were on all your > compute nodes or just on the node where you compiled them) > > >> -Original Message- >> From: Aniruddha Shet [mailto:s...@cse.ohio-state.edu] >> Sent: Wednesday, April 12, 2006 12:17 PM >> To: Open MPI Users >> Cc: Jeff Squyres (jsquyres) >> Subject: Re: [OMPI users] Error while loading shared libraries >> >> Hi, >> >> I am able to run non-OpenMPI MPI jobs where the MPI library >> is built on top >> of Intel compilers. Plus, non-MPI jobs built with Intel >> compilers run just >> fine. So, I am not sure how to go about fixing this. >> >> Thanks, >> Aniruddha >> >> - Original Message - >> From: "Jeff Squyres (jsquyres)" >> To: "Open MPI Users" >> Sent: Wednesday, April 12, 2006 8:30 AM >> Subject: Re: [OMPI users] Error while loading shared libraries >> >> >> > Greetings. >> > >> > Your logs look normal. >> > >> > The problem appears to be how you compiled / linked your final >> > executable. You said that you used -static -- I don't know >> offhand if >> > that is a supported flag for the intel compiler or not. >> Did you *link* >> > with -static, or just *compile* with it? >> > >> > Try running "ldd" on your executable -- it will show which shared >> > libraries your executable links against. >> > >> > I *think* that libcprts.so is a library internal to the >> Inter compiler >> > -- so even if icc supports "-static", this library may be >> exempted...? >> > (that's a total guess -- I'm not familiar with the internals of the >> > Intel compilers) If this is the case, you might try installing the >> > Intel compiler run-time libraries on all your nodes (this seems >> > unattractive, though). >> > >> > Regardless, I don't think that this is an MPI problem -- >> you might want >> > to try playing around with compiling some simple [non-MPI] >> "hello world" >> > applications with your Intel compiler to figure out how to compile >> > things statically. >> > >> > >> > > -Original Message- >> > > From: users-boun...@open-mpi.org >> > > [mailto:users-boun...@open-mpi.org] On Behalf Of Aniruddha Shet >> > > Sent: Monday, April 10, 2006 10:06 PM >> > > To: us...@open-mpi.org >> > > Subject: [OMPI users] Error while loading shared libraries >> > > >> > > Hi, >> > > >> > > I have built OpenMPI using ifort and icc Intel compilers >> > > with --enable-static --disable-shared options. I compile >> my job using >> > > OpenMPI wrapper compilers, additionally with -static option. >> > > When I run the >> > > job, I get the error 'orted:error while loading shared libraries: >> > > libcprts.so.5: cannot open shared object file: No such fi
Re: [OMPI users] Error while loading shared libraries
I don't think the LD_LIBRARY_PATH belongs on our command line - shouldn't you do that before calling mpiexec? Ralph Aniruddha Shet wrote: #PBS -l walltime=0:01:00 #PBS -l nodes=4:ppn=2 #PBS -N aniruddha_job #PBS -S /bin/bash cd $HOME/NPB/NPB3.2/NPB3.2-MPI/bin/OMPI/EP/A/4_NO /home/osu4005/openmpi/openmpi_NO/bin/mpiexec --bynode --prefix /home/osu4005/openmpi/openmpi_NO --mca btl mvapi -n 4 LD_LIBRARY_PATH=/usr/local/intel-8.0-20040716/lib:$LD_LIBRARY_PATH ./ep.A.4 > results.ep.A.4 - Original Message - From: "Jeff Squyres (jsquyres)" To: "Open MPI Users" Sent: Thursday, April 13, 2006 7:42 AM Subject: Re: [OMPI users] Error while loading shared libraries If you are using PBS, the environment of where you ran "qsub" is automatically copied out to the first node in your job where your script is run. Can you send your torque job script? -Original Message- From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf Of Aniruddha Shet Sent: Thursday, April 13, 2006 12:13 AM To: Open MPI Users Subject: Re: [OMPI users] Error while loading shared libraries The error message is coming from all nodes. I explicitly add the path of Intel shared library to LD_LIBRARY_PATH on my mpiexec command, in addition to it being added in my shell startup file. I make a batch request to PBS. The Intel shared library is on a common file system across compute nodes. - Original Message - From: "Jeff Squyres (jsquyres)" To: "Open MPI Users" Sent: Wednesday, April 12, 2006 11:03 PM Subject: Re: [OMPI users] Error while loading shared libraries My mistake -- I missed the "orted" part of the error message. "orted" is a helper application that is intentally launched by Open MPI during mpirun. What is happening is that it is not able to find the Intel libraries, and is therefore failing to launch. So why is it not finding the Intel shared library? - is this error message coming from a remote node? - is your LD_LIBRARY_PATH set for all your remote nodes? For example, if you're using rsh or ssh to start processes (vs. a resource manager such as SLURM or Torque), you will need to ensure that your shell startup files on all the nodes sets LD_LIBRARY_PATH properly (i.e., it's not enough to "setenv LD_LIBRARY_PATH ...; mpirun ..." because the LD_LIBRARY_PATH value won't be set on all the nodes) - Is the Intel shared library available on all your nodes? (you didn't specify if the applications that you are able to run were on all your compute nodes or just on the node where you compiled them) -Original Message- From: Aniruddha Shet [mailto:s...@cse.ohio-state.edu] Sent: Wednesday, April 12, 2006 12:17 PM To: Open MPI Users Cc: Jeff Squyres (jsquyres) Subject: Re: [OMPI users] Error while loading shared libraries Hi, I am able to run non-OpenMPI MPI jobs where the MPI library is built on top of Intel compilers. Plus, non-MPI jobs built with Intel compilers run just fine. So, I am not sure how to go about fixing this. Thanks, Aniruddha - Original Message - From: "Jeff Squyres (jsquyres)" To: "Open MPI Users" Sent: Wednesday, April 12, 2006 8:30 AM Subject: Re: [OMPI users] Error while loading shared libraries Greetings. Your logs look normal. The problem appears to be how you compiled / linked your final executable. You said that you used -static -- I don't know offhand if that is a supported flag for the intel compiler or not. Did you *link* with -static, or just *compile* with it? Try running "ldd" on your executable -- it will show which shared libraries your executable links against. I *think* that libcprts.so is a library internal to the Inter compiler -- so even if icc supports "-static", this library may be exempted...? (that's a total guess -- I'm not familiar with the internals of the Intel compilers) If this is the case, you might try installing the Intel compiler run-time libraries on all your nodes (this seems unattractive, though). Regardless, I don't think that this is an MPI problem -- you might want to try playing around with compiling some s
Re: [OMPI users] Problem with 1.0.2 and PGI 6.0
This is all I get. No core dump, no nothing :( Do you get any more of an error message than that? Did the process dump core, and if so, what does a backtrace show? -Original Message- From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf Of Jeffrey B. Layton Sent: Wednesday, April 12, 2006 3:12 PM To: us...@open-mpi.org Subject: [OMPI users] Problem with 1.0.2 and PGI 6.0 Hello, I got OpenMPI 1.0.2 built with PGI 6.0 that fixed my previous problem (problem with 1.0.1 and multiple tcp networks). However, when I tried to run the "is" code from the NPB I get the following error: [0] func:/home/jlayton/bin/OPENMPI-1.0.2-PGI6.0-OPTERON/lib/libopal.so.0 [0x2a95e2d4a4] Any ideas on this error? Thanks! Jeff ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] running a job problem
On Apr 12, 2006, at 9:09 AM, liuli...@stat.ohio-state.edu wrote: We have a Mac network running xgrid and we have successfully installed mpi. We want to run a parallell version of mrbayes. It did not have any problem when we compiled mrbayes using mpicc. But when we tried to run the compiled mrbayes, we got lots errror message mpiexec -np 4 ./mb -i yeast_noclock_imp.txt Parallel version of Parallel version of Parallel version of Parallel version of [ea285fltprinter.scc.ohio-state.edu:03327] *** An error occurred in MPI_comm_size [ea285fltprinter.scc.ohio-state.edu:03327] *** on communicator MPI_COMM_WORLD [ea285fltprinter.scc.ohio-state.edu:03327] *** MPI_ERR_COMM: invalid communicator [ea285fltprinter.scc.ohio-state.edu:03327] *** MPI_ERRORS_ARE_FATAL (goodbye) This indicates that the application is calling an MPI function with an invalid communicator. Unfortunately, this is a hard one to track down without more information. What version of mrbayes are you using and can you share your input deck? Thanks, Brian -- Brian Barrett Open MPI developer http://www.open-mpi.org/
Re: [OMPI users] running a job problem
Brian, It worked when I used the latest version of Mrbayes. Thanks. By the way, do you have any idea to submit an ompi job on xgrid? Thanks again. Liang > On Apr 12, 2006, at 9:09 AM, liuli...@stat.ohio-state.edu wrote: > >> We have a Mac network running xgrid and we have successfully installed >> mpi. We want to run a parallell version of mrbayes. It did not have >> any >> problem when we compiled mrbayes using mpicc. But when we tried to >> run the >> compiled mrbayes, we got lots errror message >> >> mpiexec -np 4 ./mb -i yeast_noclock_imp.txt >> Parallel version of >> >> Parallel version of >> >> Parallel version of >> >> Parallel version of >> >> [ea285fltprinter.scc.ohio-state.edu:03327] *** An error occurred in >> MPI_comm_size >> [ea285fltprinter.scc.ohio-state.edu:03327] *** on communicator >> MPI_COMM_WORLD >> [ea285fltprinter.scc.ohio-state.edu:03327] *** MPI_ERR_COMM: invalid >> communicator >> [ea285fltprinter.scc.ohio-state.edu:03327] *** MPI_ERRORS_ARE_FATAL >> (goodbye) > > This indicates that the application is calling an MPI function with > an invalid communicator. Unfortunately, this is a hard one to track > down without more information. What version of mrbayes are you using > and can you share your input deck? > > Thanks, > > Brian > > > -- >Brian Barrett >Open MPI developer >http://www.open-mpi.org/ > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
[OMPI users] how can I tell for sure that I'm using mvapi
I'm running on a cluster with mvapi. I built with mvapi and it runs, but I want to make absolutely sure that I'm using the IB interconnect and nothing else. How can I tell specifically what interconnect I'm using when I run. Bernie Borenstein The Boeing Company
Re: [OMPI users] how can I tell for sure that I'm using mvapi
Hi Bernie, You may specify which BTLs to use at runtime using an mca parameter: mpirun -np 2 -mca btl self,mvapi ./my_app This specifies to only use self (loopback) and mvapi. You may want to also use sm (shared memory) if you have multi-core or multi-proc.. such as: mpirun -np 2 -mca btl self,sm,mvapi ./my_app This is also in the FAQ: http://www.open-mpi.org/faq/?category=tuning#selecting-components And for mvapi/openib performance considerations: http://www.open-mpi.org/faq/?category=infiniband#ib-leave-pinned Thanks, Galen On Apr 13, 2006, at 7:56 PM, Borenstein, Bernard S wrote: I’m running on a cluster with mvapi. I built with mvapi and it runs, but I want to make absolutely sure that I’m using the IB interconnect and nothing else. How can I tell specifically what interconnect I’m using when I run. Bernie Borenstein The Boeing Company ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users