Hi there,
Recently, I've begun some calculations on a cluster where I submit a multiple node job to the Torque batch system, and the job executes multiple single-node parallel tasks. That is to say, these tasks are intended to use OpenMPI parallelism on each node, but no parallelism across nodes. Some background: The actual program being executed is Q-Chem 4.0. I use OpenMPI 1.4.2 for this, because Q-Chem is notoriously difficult to compile and this is the last known version of OpenMPI that this version of Q-Chem is known to work with. My jobs are failing with the error message below; I do not observe this error when submitting single-node jobs. From reading the mailing list archives (http://www.open-mpi.org/community/lists/users/2010/03/12348.php), I believe it is looking for a PBS node file somewhere. Since my jobs are only parallel over the node they're running on, I believe that a node file of any kind is unnecessary. My question is: Why is OpenMPI behaving differently when I submit a multi-node job compared to a single-node job? How does OpenMPI detect that it is running under a multi-node allocation? Is there a way I can change OpenMPI's behavior so it always thinks it's running on a single node, regardless of the type of job I submit to the batch system? Thank you, - Lee-Ping Wang (Postdoc in Dept. of Chemistry, Stanford University) [compute-1-1.local:10910] [[42010,0],0] ORTE_ERROR_LOG: File open failure in file ras_tm_module.c at line 153 [compute-1-1.local:10909] [[42009,0],0] ORTE_ERROR_LOG: File open failure in file ras_tm_module.c at line 153 [compute-1-1.local:10911] [[42011,0],0] ORTE_ERROR_LOG: File open failure in file ras_tm_module.c at line 153 [compute-1-1.local:10910] [[42010,0],0] ORTE_ERROR_LOG: File open failure in file ras_tm_module.c at line 87 [compute-1-1.local:10909] [[42009,0],0] ORTE_ERROR_LOG: File open failure in file ras_tm_module.c at line 87 [compute-1-1.local:10911] [[42011,0],0] ORTE_ERROR_LOG: File open failure in file ras_tm_module.c at line 87 [compute-1-1.local:10910] [[42010,0],0] ORTE_ERROR_LOG: File open failure in file base/ras_base_allocate.c at line 133 [compute-1-1.local:10909] [[42009,0],0] ORTE_ERROR_LOG: File open failure in file base/ras_base_allocate.c at line 133 [compute-1-1.local:10911] [[42011,0],0] ORTE_ERROR_LOG: File open failure in file base/ras_base_allocate.c at line 133 [compute-1-1.local:10910] [[42010,0],0] ORTE_ERROR_LOG: File open failure in file base/plm_base_launch_support.c at line 72 [compute-1-1.local:10909] [[42009,0],0] ORTE_ERROR_LOG: File open failure in file base/plm_base_launch_support.c at line 72 [compute-1-1.local:10911] [[42011,0],0] ORTE_ERROR_LOG: File open failure in file base/plm_base_launch_support.c at line 72 [compute-1-1.local:10910] [[42010,0],0] ORTE_ERROR_LOG: File open failure in file plm_tm_module.c at line 167 [compute-1-1.local:10909] [[42009,0],0] ORTE_ERROR_LOG: File open failure in file plm_tm_module.c at line 167 [compute-1-1.local:10911] [[42011,0],0] ORTE_ERROR_LOG: File open failure in file plm_tm_module.c at line 167