Ralph,

Thank you for your quick response.

Indeed as you expected, "printenv | grep PBS" produced nothing.

BTW, I have:

> qmgr -c 'p s'

# Create queues and set their attributes.
#
#
# Create and define queue default
#
create queue default
set queue default queue_type = Execution
set queue default resources_default.nodes = 7
set queue default enabled = True
set queue default started = True
#
# Set server attributes.
#
set server scheduling = True
set server acl_hosts = nagrp2
set server default_queue = default
set server log_events = 511
set server mail_from = adm
set server query_other_jobs = True
set server resources_available.nodect = 6
set server scheduler_iteration = 600
set server node_check_rate = 150
set server tcp_timeout = 6
set server next_job_number = 793

- I am not sure what/how is missing from my configurations (do you mean the 
installation "configure" step with optional directives) or else?

Thank you,

Zhiliang

At 07:16 PM 9/28/2008 -0600, you wrote:
>Hi Zhiliang
>
>First thing to check is that your Torque system is defining and  
>setting the environmental variables we are expecting in a Torque  
>system. It is quite possible that your Torque system isn't configured  
>as we expect.
>
>Can you run a job and send us the output from "printenv | grep PBS"?  
>We should see a PBS jobid, the name of the file containing the names  
>of the allocated nodes, etc.
>
>Since you are able to run with -machinefile, my guess is that your  
>system isn't setting those environmental variables as we expect. In  
>that case, you will have to keep specifying the machinefile by hand.
>
>Thanks
>Ralph
>
>On Sep 28, 2008, at 7:02 PM, Zhiliang Hu wrote:
>
>>I have asked this question on TorqueUsers list.  Responses from that  
>>list suggests that the question be asked on this list:
>>
>>The situation is:
>>
>>I can submit my jobs as in:
>>>qsub -l nodes=6:ppn=2 /path/to/mpi_program
>>
>>where "mpi_program" is:
>>/path/to/mpirun -np 12 /path/to/my_program
>>
>>-- however everything went to run on the head node (one time on the  
>>first compute node).  Jobs can be done anyway.
>>
>>While the mpirun can run on its own by specifying a "-machinefile",  
>>it is pointed out by Glen among others, and also on this web site 
>>http://wiki.hpc.ufl.edu/index.php/Common_Problems  (I got the same error as 
>>the last example on that web page) that  
>>it's not a good idea to provide machinefile since it's "already  
>>handled by OpenMPI and Torque".
>>
>>My question is, why the OpenMPI and Torque is not handling the jobs  
>>to all nodes?
>>
>>ps 1:
>>The OpenMPI is configured and installed with the "--with-tm" option,  
>>and the "ompi_info" does show lines:
>>
>>MCA ras: tm (MCA v1.0, API v1.3, Component v1.2.7)
>>MCA pls: tm (MCA v1.0, API v1.3, Component v1.2.7)
>>
>>ps 2:
>>"/path/to/mpirun -np 12 -machinefile /path/to/machinefile /path/to/ 
>>my_program"
>>works normal (send jobs to all nodes).
>>
>>Thanks,
>>
>>Zhiliang
>>
>>_______________________________________________
>>users mailing list
>>us...@open-mpi.org
>>http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>_______________________________________________
>users mailing list
>us...@open-mpi.org
>http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to