Re: [OMPI users] qsub - mpirun problem

Ralph Castain Sun, 28 Sep 2008 21:16:44 -0400

Hi Zhiliang

First thing to check is that your Torque system is defining andsetting the environmental variables we are expecting in a Torquesystem. It is quite possible that your Torque system isn't configuredas we expect.

Can you run a job and send us the output from "printenv | grep PBS"?We should see a PBS jobid, the name of the file containing the namesof the allocated nodes, etc.

Since you are able to run with -machinefile, my guess is that yoursystem isn't setting those environmental variables as we expect. Inthat case, you will have to keep specifying the machinefile by hand.


Thanks
Ralph

On Sep 28, 2008, at 7:02 PM, Zhiliang Hu wrote:

I have asked this question on TorqueUsers list. Responses from thatlist suggests that the question be asked on this list:
The situation is:

I can submit my jobs as in:
qsub -l nodes=6:ppn=2 /path/to/mpi_program
where "mpi_program" is:
/path/to/mpirun -np 12 /path/to/my_program
-- however everything went to run on the head node (one time on thefirst compute node). Jobs can be done anyway.
While the mpirun can run on its own by specifying a "-machinefile",it is pointed out by Glen among others, and also on this web site http://wiki.hpc.ufl.edu/index.php/Common_Problems(I got the same error as the last example on that web page) thatit's not a good idea to provide machinefile since it's "alreadyhandled by OpenMPI and Torque".
My question is, why the OpenMPI and Torque is not handling the jobsto all nodes?
ps 1:
The OpenMPI is configured and installed with the "--with-tm" option,and the "ompi_info" does show lines:
MCA ras: tm (MCA v1.0, API v1.3, Component v1.2.7)
MCA pls: tm (MCA v1.0, API v1.3, Component v1.2.7)

ps 2:
"/path/to/mpirun -np 12 -machinefile /path/to/machinefile /path/to/my_program"
works normal (send jobs to all nodes).

Thanks,

Zhiliang

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] qsub - mpirun problem

Reply via email to