Ralph, Thank you for your quick response.
Indeed as you expected, "printenv | grep PBS" produced nothing. BTW, I have: > qmgr -c 'p s' # Create queues and set their attributes. # # # Create and define queue default # create queue default set queue default queue_type = Execution set queue default resources_default.nodes = 7 set queue default enabled = True set queue default started = True # # Set server attributes. # set server scheduling = True set server acl_hosts = nagrp2 set server default_queue = default set server log_events = 511 set server mail_from = adm set server query_other_jobs = True set server resources_available.nodect = 6 set server scheduler_iteration = 600 set server node_check_rate = 150 set server tcp_timeout = 6 set server next_job_number = 793 - I am not sure what/how is missing from my configurations (do you mean the installation "configure" step with optional directives) or else? Thank you, Zhiliang At 07:16 PM 9/28/2008 -0600, you wrote: >Hi Zhiliang > >First thing to check is that your Torque system is defining and >setting the environmental variables we are expecting in a Torque >system. It is quite possible that your Torque system isn't configured >as we expect. > >Can you run a job and send us the output from "printenv | grep PBS"? >We should see a PBS jobid, the name of the file containing the names >of the allocated nodes, etc. > >Since you are able to run with -machinefile, my guess is that your >system isn't setting those environmental variables as we expect. In >that case, you will have to keep specifying the machinefile by hand. > >Thanks >Ralph > >On Sep 28, 2008, at 7:02 PM, Zhiliang Hu wrote: > >>I have asked this question on TorqueUsers list. Responses from that >>list suggests that the question be asked on this list: >> >>The situation is: >> >>I can submit my jobs as in: >>>qsub -l nodes=6:ppn=2 /path/to/mpi_program >> >>where "mpi_program" is: >>/path/to/mpirun -np 12 /path/to/my_program >> >>-- however everything went to run on the head node (one time on the >>first compute node). Jobs can be done anyway. >> >>While the mpirun can run on its own by specifying a "-machinefile", >>it is pointed out by Glen among others, and also on this web site >>http://wiki.hpc.ufl.edu/index.php/Common_Problems (I got the same error as >>the last example on that web page) that >>it's not a good idea to provide machinefile since it's "already >>handled by OpenMPI and Torque". >> >>My question is, why the OpenMPI and Torque is not handling the jobs >>to all nodes? >> >>ps 1: >>The OpenMPI is configured and installed with the "--with-tm" option, >>and the "ompi_info" does show lines: >> >>MCA ras: tm (MCA v1.0, API v1.3, Component v1.2.7) >>MCA pls: tm (MCA v1.0, API v1.3, Component v1.2.7) >> >>ps 2: >>"/path/to/mpirun -np 12 -machinefile /path/to/machinefile /path/to/ >>my_program" >>works normal (send jobs to all nodes). >> >>Thanks, >> >>Zhiliang >> >>_______________________________________________ >>users mailing list >>us...@open-mpi.org >>http://www.open-mpi.org/mailman/listinfo.cgi/users > >_______________________________________________ >users mailing list >us...@open-mpi.org >http://www.open-mpi.org/mailman/listinfo.cgi/users