Dear List, I've been struggling with this problem for a few days now and am out of ideas. I am submitting a job using TORQUE on a beowulf cluster. One step involves running mpiexec, and that is where this error occurs. I've found some similar other queries in the past:
http://www.open-mpi.org/community/lists/users/att-11378/attachment http://www.open-mpi.org/community/lists/users/2013/09/22608.php http://www.open-mpi.org/community/lists/users/2009/11/11129.php I'm new to using open-mpi so much of this is very new to me. However, it does not seem that my /tmp folder is full as far as I can tell. I've tried reassigning <http://www.open-mpi.org/faq/?category=sm#where-sm-file> the temporary directory using the MCA attribute<http://www.open-mpi.org/faq/?category=tuning#mca-def>(i.e. mpiexec --mca orte_tmpdir_base /home/pathA/pathB process argument1 argument2 argument3), but that was unsuccessful as well. Similarly, if thousands of sub-directories are being created, I have no idea where those would be if this is some ext3 violation issue. It's worth noting that when I submit this job--it works on some occassions and not on others. I suspect it has something to do with the nodes that I am assigned and some property of certain nodes that is an issue. It never used to have this problem until a few days ago, and now I mostly can't get it to work except on a few occasions, which makes me think that perhaps it is a node-specific issue. Any thoughts or suggestions would be much appreciated! Thanks, Brandon PS I've copied the full error output below: [bc11bl08.deac.wfu.edu:31532] opal_os_dirpath_create: Error: Unable to create the sub-directory (/tmp/openmpi-sessions-turn...@bc11bl08.deac.wfu.edu_0) of (/tmp/openmpi-sessions-turn...@bc11bl08.deac.wfu.edu_0/2243/0/7), mkdir failed [1] [bc11bl08.deac.wfu.edu:31532] [[2243,0],7] ORTE_ERROR_LOG: Error in file ../../orte/util/session_dir.c at line 106 [bc11bl08.deac.wfu.edu:31532] [[2243,0],7] ORTE_ERROR_LOG: Error in file ../../orte/util/session_dir.c at line 399 [bc11bl08.deac.wfu.edu:31532] [[2243,0],7] ORTE_ERROR_LOG: Error in file ../../../../orte/mca/ess/base/ess_base_std_orted.c at line 283 [bc11bl08.deac.wfu.edu:31532] [[2243,0],7] ORTE_ERROR_LOG: A message is attempting to be sent to a process whose contact information is unknown in file ../../../../../orte/mca/rml/oob/rml_oob_send.c at line 104 [bc11bl08.deac.wfu.edu:31532] [[2243,0],7] could not get route to [[INVALID],INVALID] [bc11bl08.deac.wfu.edu:31532] [[2243,0],7] ORTE_ERROR_LOG: A message is attempting to be sent to a process whose contact information is unknown in file ../../orte/util/show_help.c at line 627 [bc11bl08.deac.wfu.edu:31532] [[2243,0],7] ORTE_ERROR_LOG: Error in file ../../../../../orte/mca/ess/tm/ess_tm_module.c at line 112 [bc11bl08.deac.wfu.edu:31532] [[2243,0],7] ORTE_ERROR_LOG: A message is attempting to be sent to a process whose contact information is unknown in file ../../../../../orte/mca/rml/oob/rml_oob_send.c at line 104 [bc11bl08.deac.wfu.edu:31532] [[2243,0],7] could not get route to [[INVALID],INVALID] [bc11bl08.deac.wfu.edu:31532] [[2243,0],7] ORTE_ERROR_LOG: A message is attempting to be sent to a process whose contact information is unknown in file ../../orte/util/show_help.c at line 627 [bc11bl08.deac.wfu.edu:31532] [[2243,0],7] ORTE_ERROR_LOG: Error in file ../../orte/runtime/orte_init.c at line 128 [bc11bl08.deac.wfu.edu:31532] [[2243,0],7] ORTE_ERROR_LOG: A message is attempting to be sent to a process whose contact information is unknown in file ../../../../../orte/mca/rml/oob/rml_oob_send.c at line 104 [bc11bl08.deac.wfu.edu:31532] [[2243,0],7] could not get route to [[INVALID],INVALID] [bc11bl08.deac.wfu.edu:31532] [[2243,0],7] ORTE_ERROR_LOG: A message is attempting to be sent to a process whose contact information is unknown in file ../../orte/util/show_help.c at line 627 [bc11bl08.deac.wfu.edu:31532] [[2243,0],7] ORTE_ERROR_LOG: Error in file ../../orte/orted/orted_main.c at line 357 =>> PBS: job killed: walltime 3626 exceeded limit 3600 Terminated mpiexec: killing job...