Hi there,


Recently, I've begun some calculations on a cluster where I submit a
multiple node job to the Torque batch system, and the job executes multiple
single-node parallel tasks.  That is to say, these tasks are intended to use
OpenMPI parallelism on each node, but no parallelism across nodes.  



Some background: The actual program being executed is Q-Chem 4.0.  I use
OpenMPI 1.4.2 for this, because Q-Chem is notoriously difficult to compile
and this is the last known version of OpenMPI that this version of Q-Chem is
known to work with.



My jobs are failing with the error message below; I do not observe this
error when submitting single-node jobs.  From reading the mailing list
archives (http://www.open-mpi.org/community/lists/users/2010/03/12348.php),
I believe it is looking for a PBS node file somewhere.  Since my jobs are
only parallel over the node they're running on, I believe that a node file
of any kind is unnecessary.  



My question is: Why is OpenMPI behaving differently when I submit a
multi-node job compared to a single-node job?  How does OpenMPI detect that
it is running under a multi-node allocation?  Is there a way I can change
OpenMPI's behavior so it always thinks it's running on a single node,
regardless of the type of job I submit to the batch system?



Thank you,



-          Lee-Ping Wang (Postdoc in Dept. of Chemistry, Stanford
University)



[compute-1-1.local:10910] [[42010,0],0] ORTE_ERROR_LOG: File open failure in
file ras_tm_module.c at line 153

[compute-1-1.local:10909] [[42009,0],0] ORTE_ERROR_LOG: File open failure in
file ras_tm_module.c at line 153

[compute-1-1.local:10911] [[42011,0],0] ORTE_ERROR_LOG: File open failure in
file ras_tm_module.c at line 153

[compute-1-1.local:10910] [[42010,0],0] ORTE_ERROR_LOG: File open failure in
file ras_tm_module.c at line 87

[compute-1-1.local:10909] [[42009,0],0] ORTE_ERROR_LOG: File open failure in
file ras_tm_module.c at line 87

[compute-1-1.local:10911] [[42011,0],0] ORTE_ERROR_LOG: File open failure in
file ras_tm_module.c at line 87

[compute-1-1.local:10910] [[42010,0],0] ORTE_ERROR_LOG: File open failure in
file base/ras_base_allocate.c at line 133

[compute-1-1.local:10909] [[42009,0],0] ORTE_ERROR_LOG: File open failure in
file base/ras_base_allocate.c at line 133

[compute-1-1.local:10911] [[42011,0],0] ORTE_ERROR_LOG: File open failure in
file base/ras_base_allocate.c at line 133

[compute-1-1.local:10910] [[42010,0],0] ORTE_ERROR_LOG: File open failure in
file base/plm_base_launch_support.c at line 72

[compute-1-1.local:10909] [[42009,0],0] ORTE_ERROR_LOG: File open failure in
file base/plm_base_launch_support.c at line 72

[compute-1-1.local:10911] [[42011,0],0] ORTE_ERROR_LOG: File open failure in
file base/plm_base_launch_support.c at line 72

[compute-1-1.local:10910] [[42010,0],0] ORTE_ERROR_LOG: File open failure in
file plm_tm_module.c at line 167

[compute-1-1.local:10909] [[42009,0],0] ORTE_ERROR_LOG: File open failure in
file plm_tm_module.c at line 167

[compute-1-1.local:10911] [[42011,0],0] ORTE_ERROR_LOG: File open failure in
file plm_tm_module.c at line 167

Reply via email to