I met the same problem with this link: http://www.open-mpi.org/community/lists/users/2009/12/11374.php
in the link, they give a solution that use v1.4 open mpi instead of v1.3 open mpi. but, I am using v1.7a1r22794 open mpi, and met the same problem. here is what I have done: my cluster composed of two machines:nimbus(master) and nimbus1(slave), when I run mpirun -np 40 -am ft-enable-cr --hostfile .mpihostfile myapplication on the nimbus, and it doesn't work, it shows: [nimbus1:21387] opal_os_dirpath_create: Error: Unable to create the sub-directory (/tmp/openmpi-sessions-mpiu@nimbus1_0/59759) of (/tmp/openmpi-sessions-mpiu@nimbus1_0/59759/0/1), mkdir failed [1] [nimbus1:21387] [[59759,0],1] ORTE_ERROR_LOG: Error in file util/session_dir.c at line 106 [nimbus1:21387] [[59759,0],1] ORTE_ERROR_LOG: Error in file util/session_dir.c at line 399 [nimbus1:21387] [[59759,0],1] ORTE_ERROR_LOG: Error in file base/ess_base_std_orted.c at line 301 [nimbus1:21387] [[59759,0],1] ORTE_ERROR_LOG: A message is attempting to be sent to a process whose contact information is unknown in file rml_oob_send.c at line 104 [nimbus1:21387] [[59759,0],1] could not get route to [[INVALID],INVALID] [nimbus1:21387] [[59759,0],1] ORTE_ERROR_LOG: A message is attempting to be sent to a process whose contact information is unknown in file util/show_help.c at line 602 [nimbus1:21387] [[59759,0],1] ORTE_ERROR_LOG: Error in file ess_env_module.c at line 143 [nimbus1:21387] [[59759,0],1] ORTE_ERROR_LOG: A message is attempting to be sent to a process whose contact information is unknown in file rml_oob_send.c at line 104 [nimbus1:21387] [[59759,0],1] could not get route to [[INVALID],INVALID] [nimbus1:21387] [[59759,0],1] ORTE_ERROR_LOG: A message is attempting to be sent to a process whose contact information is unknown in file util/show_help.c at line 602 [nimbus1:21387] [[59759,0],1] ORTE_ERROR_LOG: Error in file runtime/orte_init.c at line 129 [nimbus1:21387] [[59759,0],1] ORTE_ERROR_LOG: A message is attempting to be sent to a process whose contact information is unknown in file rml_oob_send.c at line 104 [nimbus1:21387] [[59759,0],1] could not get route to [[INVALID],INVALID] [nimbus1:21387] [[59759,0],1] ORTE_ERROR_LOG: A message is attempting to be sent to a process whose contact information is unknown in file util/show_help.c at line 602 [nimbus1:21387] [[59759,0],1] ORTE_ERROR_LOG: Error in file orted/orted_main.c at line 355 -------------------------------------------------------------------------- A daemon (pid 10737) died unexpectedly with status 255 while attempting to launch so we are aborting. There may be more information reported by the environment (see above). This may be because the daemon was unable to find all the needed shared libraries on the remote node. You may set your LD_LIBRARY_PATH to have the location of the shared libraries on the remote nodes and this will automatically be forwarded to the remote nodes. -------------------------------------------------------------------------- -------------------------------------------------------------------------- mpirun noticed that the job aborted, but has no info as to the process that caused that situation. -------------------------------------------------------------------------- cheers fengguang