Hi Noam Could it be that Torque, or probably more likely NFS, is too slow to create/make available the PBS_NODEFILE?
What if you insert a "sleep 2", or whatever number of seconds you want, before the mpiexec command line? Or maybe better, a "ls -l $PBS_NODEFILE; cat $PBS_NODEFILE", just to make sure the file it is available and filled with the node list, before mpiexec takes over? My two cents, Gus Correa On 09/20/2013 09:55 AM, Noam Bernstein wrote:
Hi - we've been using openmpi for a while, but only for the last few months with torque/maui. Intermittently (maybe 1/10 jobs), we get mpi jobs that fail with the error: [compute-2-4:32448] [[52041,0],0] ORTE_ERROR_LOG: File open failure in file ras_tm_module.c at line 142 [compute-2-4:32448] [[52041,0],0] ORTE_ERROR_LOG: File open failure in file ras_tm_module.c at line 82 [compute-2-4:32448] [[52041,0],0] ORTE_ERROR_LOG: File open failure in file base/ras_base_allocate.c at line 149 [compute-2-4:32448] [[52041,0],0] ORTE_ERROR_LOG: File open failure in file base/plm_base_launch_support.c at line 99 [compute-2-4:32448] [[52041,0],0] ORTE_ERROR_LOG: File open failure in file plm_tm_module.c at line 194 This is completely unrepeatable - resubmitting the same job almost always works the second time around. The line appears to be associated with looking for the torque/maui generated node file, and when I do something like echo $PBS_NODEFILE cat $PBS_NODEFILE it appears that the file is present and correct. We're running OpenMPI 1.6.4, configured with ./configure \ --prefix=${DEST} \ --with-tm=/usr/local/torque \ --enable-mpirun-prefix-by-default \ --with-openib=/usr \ --with-openib-libdir=/usr/lib64 Has anyone seen anything like this before, or has any ideas of what might be happening? It appears to be a line where openmpi looks for the PBS node file, which is on a local filesystem (e.g. PBS_NODEFILE=/var/spool/torque/aux//4600.tin). thanks, Noam Noam Bernstein Center for Computational Materials Science NRL Code 6390 noam.bernst...@nrl.navy.mil _______________________________________________ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users