Dear List,

I've been struggling with this problem for a few days now and am out of
ideas. I am submitting a job using TORQUE on a beowulf cluster. One step
involves running mpiexec, and that is where this error occurs. I've found
some similar other queries in the past:

http://www.open-mpi.org/community/lists/users/att-11378/attachment

http://www.open-mpi.org/community/lists/users/2013/09/22608.php

http://www.open-mpi.org/community/lists/users/2009/11/11129.php

I'm new to using open-mpi so much of this is very new to me. However, it
does not seem that my /tmp folder is full as far as I can tell. I've tried
reassigning <http://www.open-mpi.org/faq/?category=sm#where-sm-file> the
temporary directory using the MCA
attribute<http://www.open-mpi.org/faq/?category=tuning#mca-def>(i.e.
mpiexec --mca orte_tmpdir_base /home/pathA/pathB process argument1
argument2 argument3), but that was unsuccessful as well. Similarly, if
thousands of sub-directories are being created, I have no idea where those
would be if this is some ext3 violation issue. It's worth noting that when
I submit this job--it works on some occassions and not on others. I suspect
it has something to do with the nodes that I am assigned and some property
of certain nodes that is an issue.

It never used to have this problem until a few days ago, and now I mostly
can't get it to work except on a few occasions, which makes me think that
perhaps it is a node-specific issue. Any thoughts or suggestions would be
much appreciated!

Thanks,

Brandon

PS I've copied the full error output below:
[bc11bl08.deac.wfu.edu:31532] opal_os_dirpath_create: Error: Unable to
create the sub-directory
(/tmp/openmpi-sessions-turn...@bc11bl08.deac.wfu.edu_0) of
(/tmp/openmpi-sessions-turn...@bc11bl08.deac.wfu.edu_0/2243/0/7), mkdir
failed [1]
[bc11bl08.deac.wfu.edu:31532] [[2243,0],7] ORTE_ERROR_LOG: Error in file
../../orte/util/session_dir.c at line 106
[bc11bl08.deac.wfu.edu:31532] [[2243,0],7] ORTE_ERROR_LOG: Error in file
../../orte/util/session_dir.c at line 399
[bc11bl08.deac.wfu.edu:31532] [[2243,0],7] ORTE_ERROR_LOG: Error in file
../../../../orte/mca/ess/base/ess_base_std_orted.c at line 283
[bc11bl08.deac.wfu.edu:31532] [[2243,0],7] ORTE_ERROR_LOG: A message is
attempting to be sent to a process whose contact information is unknown in
file ../../../../../orte/mca/rml/oob/rml_oob_send.c at line 104
[bc11bl08.deac.wfu.edu:31532] [[2243,0],7] could not get route to
[[INVALID],INVALID]
[bc11bl08.deac.wfu.edu:31532] [[2243,0],7] ORTE_ERROR_LOG: A message is
attempting to be sent to a process whose contact information is unknown in
file ../../orte/util/show_help.c at line 627
[bc11bl08.deac.wfu.edu:31532] [[2243,0],7] ORTE_ERROR_LOG: Error in file
../../../../../orte/mca/ess/tm/ess_tm_module.c at line 112
[bc11bl08.deac.wfu.edu:31532] [[2243,0],7] ORTE_ERROR_LOG: A message is
attempting to be sent to a process whose contact information is unknown in
file ../../../../../orte/mca/rml/oob/rml_oob_send.c at line 104
[bc11bl08.deac.wfu.edu:31532] [[2243,0],7] could not get route to
[[INVALID],INVALID]
[bc11bl08.deac.wfu.edu:31532] [[2243,0],7] ORTE_ERROR_LOG: A message is
attempting to be sent to a process whose contact information is unknown in
file ../../orte/util/show_help.c at line 627
[bc11bl08.deac.wfu.edu:31532] [[2243,0],7] ORTE_ERROR_LOG: Error in file
../../orte/runtime/orte_init.c at line 128
[bc11bl08.deac.wfu.edu:31532] [[2243,0],7] ORTE_ERROR_LOG: A message is
attempting to be sent to a process whose contact information is unknown in
file ../../../../../orte/mca/rml/oob/rml_oob_send.c at line 104
[bc11bl08.deac.wfu.edu:31532] [[2243,0],7] could not get route to
[[INVALID],INVALID]
[bc11bl08.deac.wfu.edu:31532] [[2243,0],7] ORTE_ERROR_LOG: A message is
attempting to be sent to a process whose contact information is unknown in
file ../../orte/util/show_help.c at line 627
[bc11bl08.deac.wfu.edu:31532] [[2243,0],7] ORTE_ERROR_LOG: Error in file
../../orte/orted/orted_main.c at line 357
=>> PBS: job killed: walltime 3626 exceeded limit 3600
Terminated
mpiexec: killing job...

Reply via email to