Hi Zhiliang
This has nothing to do with how you configured Open MPI. The issue is
that your Torque queue manager isn't setting the expected environment
variables to tell us the allocation. I'm not sure why it wouldn't be
doing so, and I'm afraid I'm not enough of a Torque person to know how
to guide you.
What is happening, though, is that we are actually launching via ssh
instead of Torque since we don't see the Torque system. Your system
appears happy to let us do so, so this may not be a real problem for
you other than the annoyance of having to specify the machinefile
every time.
I'm curious as to how you find the machinefile - what is the file
named? In a typical Torque install, the file is located in some
default tmp directory and is given a name that includes the PBS jobid.
Since you didn't find that environment variable, how did you know what
filename to pass mpirun?
Thanks
Ralph
On Sep 28, 2008, at 8:07 PM, Zhiliang Hu wrote:
Ralph,
Thank you for your quick response.
Indeed as you expected, "printenv | grep PBS" produced nothing.
BTW, I have:
qmgr -c 'p s'
# Create queues and set their attributes.
#
#
# Create and define queue default
#
create queue default
set queue default queue_type = Execution
set queue default resources_default.nodes = 7
set queue default enabled = True
set queue default started = True
#
# Set server attributes.
#
set server scheduling = True
set server acl_hosts = nagrp2
set server default_queue = default
set server log_events = 511
set server mail_from = adm
set server query_other_jobs = True
set server resources_available.nodect = 6
set server scheduler_iteration = 600
set server node_check_rate = 150
set server tcp_timeout = 6
set server next_job_number = 793
- I am not sure what/how is missing from my configurations (do you
mean the installation "configure" step with optional directives) or
else?
Thank you,
Zhiliang
At 07:16 PM 9/28/2008 -0600, you wrote:
Hi Zhiliang
First thing to check is that your Torque system is defining and
setting the environmental variables we are expecting in a Torque
system. It is quite possible that your Torque system isn't configured
as we expect.
Can you run a job and send us the output from "printenv | grep PBS"?
We should see a PBS jobid, the name of the file containing the names
of the allocated nodes, etc.
Since you are able to run with -machinefile, my guess is that your
system isn't setting those environmental variables as we expect. In
that case, you will have to keep specifying the machinefile by hand.
Thanks
Ralph
On Sep 28, 2008, at 7:02 PM, Zhiliang Hu wrote:
I have asked this question on TorqueUsers list. Responses from that
list suggests that the question be asked on this list:
The situation is:
I can submit my jobs as in:
qsub -l nodes=6:ppn=2 /path/to/mpi_program
where "mpi_program" is:
/path/to/mpirun -np 12 /path/to/my_program
-- however everything went to run on the head node (one time on
the
first compute node). Jobs can be done anyway.
While the mpirun can run on its own by specifying a "-machinefile",
it is pointed out by Glen among others, and also on this web site http://wiki.hpc.ufl.edu/index.php/Common_Problems
(I got the same error as the last example on that web page) that
it's not a good idea to provide machinefile since it's "already
handled by OpenMPI and Torque".
My question is, why the OpenMPI and Torque is not handling the jobs
to all nodes?
ps 1:
The OpenMPI is configured and installed with the "--with-tm" option,
and the "ompi_info" does show lines:
MCA ras: tm (MCA v1.0, API v1.3, Component v1.2.7)
MCA pls: tm (MCA v1.0, API v1.3, Component v1.2.7)
ps 2:
"/path/to/mpirun -np 12 -machinefile /path/to/machinefile /path/
to/ my_program"
works normal (send jobs to all nodes).
Thanks,
Zhiliang
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users