In that case, the way I installed it is not right. I thought that only the HN 
should be configured with the tm support 
not the worker nodes; the worker nodes only have the PBS daemon clients - No 
need for tm support on the worker nodes.
 
When I ran ompi_info | grep tm on the worker nodes, the output is empty.

The information on the following link has mislead me then:
http://www.physics.iitm.ac.in/~sanoop/linux_files/cluster.html 
(check OpenMPI Configuration section.)

~Belaid.
> Date: Tue, 1 Dec 2009 18:36:15 -0500
> From: g...@ldeo.columbia.edu
> To: us...@open-mpi.org
> Subject: Re: [OMPI users] mpirun is using one PBS node only
> 
> Hi Belaid Moa
> 
> The OpenMPI I install and use is on a NFS mounted directory.
> Hence, all the nodes see the same version, which has "tm" support.
> 
> After reading your OpenMPI configuration parameters on the headnode
> and working nodes (and the difference between them),
> I would guess (just a guess) that the problem you see is because your
> OpenMPI version on the nodes (probably) do not have Torque support.
> 
> However, you should first verify that this is really the case,
> because if the OpenMPI configure script
> finds the torque libraries it will (probably) configure and
> install OpenMPI with "tm" support, even if you don't ask it
> explicitly on the working nodes.
> Hence, ssh to WN1 or WN2 and do "ompi_info" to check this out first.
> 
> If there is no Torque on WN1 and WN2 then OpenMPI won't find it
> and you won't have "tm" support on the nodes.
> 
> In any case, if OpenMPI "tm" support is missing on WN[1,2},
> I would suggest that you reinstall OpenMPI on WN1 and WN2 *with tm support*.
> This will require that you have Torque on the working nodes also,
> and use the same configure command line that you used on the headnode.
> 
> A low-tech alternative is to copy over your OpenMPI directory tree to 
> the WN1 and WN2 nodes.
> 
> A yet simpler alternative is to reinstall OpenMPI on the headnode
> on a NFS mounted directory (as I do here), then
> add the corresponding "bin" path to your PATH,
> and the corresponding "lib" path to your LD_LIBRARY_PATH environment
> variables.
> 
> Think about maintenance, and upgrades:
> On an NFS mounted directory
> you need to install only once, whereas the way you have it now you need
> to do it N+1 times (or have a mechanism to propagate a single
> installation from the head node to the compute nodes).
> 
> NFS is your friend!  :)
> 
> I hope this helps,
> Gus Correa
> ---------------------------------------------------------------------
> Gustavo Correa
> Lamont-Doherty Earth Observatory - Columbia University
> Palisades, NY, 10964-8000 - USA
> ---------------------------------------------------------------------
> 
> 
> Belaid MOA wrote:
> > I tried -bynode option but it did not change anything. I also tried the 
> > "hostname" name command and
> > I keep getting only the name of one node repeated according to the -n 
> > value.
> > 
> > Just to make sure I did the right installation, here is what I did:
> > 
> > -- On the head node (HN), I installed openMPI using the --with-tm option 
> > as follows:
> > 
> > ./configure --with-tm=/var/spool/torque --enable-static
> > make install all
> > 
> > -- On the worker nodes (WN1 and WN2), I installed openMPI without tm 
> > option as follows (it is a local installation on each worker node):
> > 
> > ./configure --enable-static
> > make install all
> > 
> > Is this correct?
> > 
> > Thanks a lot in advance.
> > ~Belaid.
> >  > Date: Tue, 1 Dec 2009 17:07:58 -0500
> >  > From: g...@ldeo.columbia.edu
> >  > To: us...@open-mpi.org
> >  > Subject: Re: [OMPI users] mpirun is using one PBS node only
> >  >
> >  > Hi Belaid Moa
> >  >
> >  > Belaid MOA wrote:
> >  > > Thanks a lot Gus for you help again. I only have one CPU per node.
> >  > > The -n X option (no matter what the value of X is) shows X processes
> >  > > running on one node only (the other one is free).
> >  >
> >  > So, somehow it is oversubscribing your single processor
> >  > on the first node.
> >  >
> >  > A simple diagnostic:
> >  >
> >  > Have you tried to run "hostname" on the two nodes through Torque/PBS
> >  > and mpiexec?
> >  >
> >  > [PBS directives, cd $PBS_O_WORKDIR, etc]
> >  > ...
> >  > /full/path/to/openmpi/bin/mpiexec -n 2 hostname
> >  >
> >  > Try also with the -byslot and -bynode options.
> >  >
> >  >
> >  > > If I add the machinefile option with WN1 and WN2 in it, the right
> >  > > behavior is manifested. According to the documentation,
> >  > > mpirun should get the PBS_NODEFILE automatically from the PBS.
> >  >
> >  > Yes, if you compiled OpenMPI you are using with Torque ("tm) support.
> >  > Did you?
> >  > Make sure the it has tm support.
> >  > Run "ompi_info" with full path if needed, to check that.
> >  > Are you sure the correct path to what you want is
> >  > /usr/local/bin/mpirun ?
> >  > Linux distributions, compilers, and other tools come with their
> >  > mpiexec and put them in places that you may not suspect, to better
> >  > double check you get what you want.
> >  > That has been a source of repeated confusion on this and other
> >  > mailing lists.
> >  >
> >  > Also, make sure that passwordless ssh across the nodes is working.
> >  >
> >  > Yet another thing to check, for easy name resolution,
> >  > your /etc/hosts file on *all*
> >  > nodes including the headnode should
> >  > have a list of all nodes and their IP addresses.
> >  > Something like this:
> >  >
> >  > 127.0.0.1 localhost.localdomain localhost
> >  > 192.168.0.1 WN1
> >  > 192.168.0.2 WN2
> >  >
> >  > (The IPs above are guesswork of mine, you know better which to use.)
> >  >
> >  > > So, I do
> >  > > not need to use machinefile.
> >  > >
> >  >
> >  > True assuming the first condition above (OpenMPI *with* "tm" suport).
> >  >
> >  > > Any ideas?
> >  > >
> >  >
> >  > Yes, and I sent it to you on my last email!
> >  > Try the "-bynode" option of mpiexec.
> >  > ("man mpiexec" is your friend!)
> >  >
> >  > > Thanks a lot in advance.
> >  > > ~Belaid.
> >  > >
> >  >
> >  > Best of luck!
> >  > Gus Correa
> >  > ---------------------------------------------------------------------
> >  > Gustavo Correa
> >  > Lamont-Doherty Earth Observatory - Columbia University
> >  > Palisades, NY, 10964-8000 - USA
> >  > ---------------------------------------------------------------------
> >  >
> >  > PS - Your web site link to Paul Krugman is out of date.
> >  > Here are one to his (active) blog,
> >  > and another to his (no longer updated) web page: :)
> >  >
> >  > http://krugman.blogs.nytimes.com/
> >  > http://www.princeton.edu/~pkrugman/
> >  >
> >  > >
> >  > > > Date: Tue, 1 Dec 2009 15:42:30 -0500
> >  > > > From: g...@ldeo.columbia.edu
> >  > > > To: us...@open-mpi.org
> >  > > > Subject: Re: [OMPI users] mpirun is using one PBS node only
> >  > > >
> >  > > > Hi Belaid Moa
> >  > > >
> >  > > > Belaid MOA wrote:
> >  > > > > Hi everyone,
> >  > > > > Here is another elementary question. I tried the following 
> > steps found
> >  > > > > in the FAQ section of www.open-mpi.org with a simple hello world
> >  > > example
> >  > > > > (with PBS/torque):
> >  > > > > $ qsub -l nodes=2 my_script.sh
> >  > > > >
> >  > > > > my_script.sh is pasted below:
> >  > > > > ========================
> >  > > > > #!/bin/sh -l
> >  > > > > #PBS -N helloTest
> >  > > > > #PBS -j eo
> >  > > > > echo `cat $PBS_NODEFILE` # shows two nodes: WN1 WN2
> >  > > > > cd $PBS_O_WORKDIR
> >  > > > > /usr/local/bin/mpirun hello
> >  > > > > ========================
> >  > > > >
> >  > > > > When the job is submitted, only one process is ran. When I add the
> >  > > -n 2
> >  > > > > option to the mpirun command,
> >  > > > > two processes are ran but on one node only.
> >  > > >
> >  > > > Do you have a single CPU/core per node?
> >  > > > Or are they multi-socket/multi-core?
> >  > > >
> >  > > > Check "man mpiexec" for the options that control on which nodes and
> >  > > > slots, etc your program will run.
> >  > > > ("Man mpiexec" will tell you more than I possibly can.)
> >  > > >
> >  > > > The default option is "-byslot",
> >  > > > which will use all "slots" (actually cores
> >  > > > or CPUs) available on a node before it moves to the next node.
> >  > > > Reading your question and your surprise with the result,
> >  > > > I would guess what you want is "-bynode" (not the default).
> >  > > >
> >  > > > Also, if you have more than one CPU/core per node,
> >  > > > you need to put this information in your Torque/PBS "nodes" file
> >  > > > (and restart your pbs_server daemon).
> >  > > > Something like this (for 2 CPUs/cores per node):
> >  > > >
> >  > > > WN1 np=2
> >  > > > WN2 np=2
> >  > > >
> >  > > > I hope this helps,
> >  > > > Gus Correa
> >  > > > ---------------------------------------------------------------------
> >  > > > Gustavo Correa
> >  > > > Lamont-Doherty Earth Observatory - Columbia University
> >  > > > Palisades, NY, 10964-8000 - USA
> >  > > > ---------------------------------------------------------------------
> >  > > >
> >  > > >
> >  > > > > Note that echo `cat
> >  > > > > $PBS_NODEFILE` outputs
> >  > > > > the two nodes I am using: WN1 and WN2.
> >  > > > >
> >  > > > > The output from ompi_info is shown below:
> >  > > > >
> >  > > > > $ ompi_info | grep tm
> >  > > > > MCA memory: ptmalloc2 (MCA v2.0, API v2.0, Component v1.3.3)
> >  > > > > MCA ras: tm (MCA v2.0, API v2.0, Component v1.3.3)
> >  > > > > MCA plm: tm (MCA v2.0, API v2.0, Component v1.3.3)
> >  > > > >
> >  > > > > Any help on why openMPI/mpirun is using only one PBS node is very
> >  > > > > appreciated.
> >  > > > >
> >  > > > > Thanks a lot in advance and sorry for bothering you guys with my
> >  > > > > elementary questions!
> >  > > > >
> >  > > > > ~Belaid.
> >  > > > >
> >  > > > >
> >  > > > >
> >  > > > >
> >  > > 
> > ------------------------------------------------------------------------
> >  > > > > Windows Live: Keep your friends up to date with what you do online.
> >  > > > > <http://go.microsoft.com/?linkid=9691810>
> >  > > > >
> >  > > > >
> >  > > > >
> >  > > 
> > ------------------------------------------------------------------------
> >  > > > >
> >  > > > > _______________________________________________
> >  > > > > users mailing list
> >  > > > > us...@open-mpi.org
> >  > > > > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >  > > >
> >  > > > _______________________________________________
> >  > > > users mailing list
> >  > > > us...@open-mpi.org
> >  > > > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >  > >
> >  > > 
> > ------------------------------------------------------------------------
> >  > > Windows Live: Keep your friends up to date with what you do online.
> >  > > <http://go.microsoft.com/?linkid=9691810>
> >  > >
> >  > >
> >  > > 
> > ------------------------------------------------------------------------
> >  > >
> >  > > _______________________________________________
> >  > > users mailing list
> >  > > us...@open-mpi.org
> >  > > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >  >
> >  > _______________________________________________
> >  > users mailing list
> >  > us...@open-mpi.org
> >  > http://www.open-mpi.org/mailman/listinfo.cgi/users
> > 
> > ------------------------------------------------------------------------
> > Get a great deal on Windows 7 and see how it works the way you want. See 
> > the Windows 7 offers now. <http://go.microsoft.com/?linkid=9691813>
> > 
> > 
> > ------------------------------------------------------------------------
> > 
> > _______________________________________________
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
                                          
_________________________________________________________________
Windows Live: Keep your friends up to date with what you do online.
http://go.microsoft.com/?linkid=9691815

Reply via email to