Hi Abhisek

On 08/03/2015 12:59 PM, abhisek Mondal wrote:
Hi,

   I'm using openmpi-1.6.4 to distribute a jobs in 2 different nodes
using this command:
/"mpirun --hostfile myhostfile -np 10 nwchem my_code.nw"/
Here, "myhostfile" contains:
/cx0937 slots=5 /
/cx0934 slots=5/

I am assuming by pbs you mean Torque.
If your Open MPI was built with Torque support (--with-tm),
then you don't even need the --hostfile option
(and probably shouldn't use it).
Unless newchem behaves in a very non-standard way,
which I don't really know.

Open MPI will use the nodes provided by Torque.

To check this do;

ompi_info |grep tm

In the Open MPI parlance Torque is "tm".


But as I have to submit the jobs using .pbs script, I'm wondering in
this case, how "mpirun" going to choose the node (free node allocation
is done by pbs) from "myhostfile".
I mean, does it happen that until the specific-nodes (as mentioned in
myhostfile) become free "mpirun" is going to wait and then start ?
How can I forward the allocated node name(by pbs) to /mpirun/ command ?

A little light on this matter would be really great.


Your script only starts after Torque allocates the nodes and starts the script on the first node.
Mpirun doesn't choose the nodes, it uses it.
If you are using Torque it may be worth looking into some of its environment variables.

"man qsub" will tell you a lot about them, and probably will clarify
many things more.

Some very useful ones are:

       PBS_O_WORKDIR
the absolute path of the current working directory of the qsub command.

       PBS_JOBID
              the job identifier assigned to the job by the batch system.

       PBS_JOBNAME
              the job name supplied by the user.

       PBS_NODEFILE
the name of the file contain the list of nodes assigned to the job (for parallel and cluster systems).


***

In your script you can

cd $PBS_O_WORKDIR

as by default Torque puts you in your home directory in the compute node, which may not be where you want to be.

Another way to document the nodes that you're using is to put this
line in your script:

cat $PBS_NODEFILE

which will list the nodes (repeated by as many cores/CPUs as you have requested from each node). Actually, if you ever want to use the
mpirun --hostfile option, the actual node file would be $PBS_NODEFILE.
[You don't need to do it if Open MPI was built with Torque support.]



I hope this helps.
Gus Correa


Thank you.

--
Abhisek Mondal
/Research Fellow
/
/Structural Biology and Bioinformatics
/
/Indian Institute of Chemical Biology/
/Kolkata 700032
/
/INDIA
/


_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2015/08/27383.php


Reply via email to