On Nov 3, 2011, at 8:54 AM, Blosch, Edwin L wrote:

> Can anyone guess what the problem is here?  I was under the impression that 
> OpenMPI (1.4.4) would look for /tmp and would create its shared-memory 
> backing file there, i.e. if you don’t set orte_tmpdir_base to anything.

That is correct

>  
> Well, there IS a /tmp and yet it appears that OpenMPI has chosen to use 
> /dev/shm.  Why?

Looks like a bug to me - it shouldn't be doing that. Will have to take a look - 
first I've heard of that behavior.


>  
> And, next question, why doesn’t it work?  Here are the oddities of this 
> cluster:
> -    the cluster is ‘diskless’
> -    /tmp is an NFS mount
> -    /dev/shm is 12 GB and has 755 permissions
>  
> Filesystem            Size  Used Avail Use% Mounted on
> tmpfs                  12G  164K   12G   1% /dev/shm
>  
> % ls –l output:
> drwxr-xr-x  2 root root         40 Oct 28 09:14 shm
>  
>  
> The error message below suggests that OpenMPI (1.4.4) has somehow 
> auto-magically decided to use /dev/shm and is failing to be able to us e it, 
> for some reason.
>  
> Thanks for whatever help you can offer,
>  
> Ed
>  
>  
> e8315:02942] opal_os_dirpath_create: Error: Unable to create the 
> sub-directory (/dev/shm/openmpi-sessions-estenfte@e8315_0) of 
> (/dev/shm/openmpi-sessions-estenfte@e8315_0/8474/0/1), mkdir failed [1]
> [e8315:02942] [[8474,0],1] ORTE_ERROR_LOG: Error in file util/session_dir.c 
> at line 106
> [e8315:02942] [[8474,0],1] ORTE_ERROR_LOG: Error in file util/session_dir.c 
> at line 399
> [e8315:02942] [[8474,0],1] ORTE_ERROR_LOG: Error in file 
> base/ess_base_std_orted.c at line 206
> --------------------------------------------------------------------------
> It looks like orte_init failed for some reason; your parallel process is
> 
> likely to abort.  There are many reasons that a parallel process can
> fail during orte_init; some of which are due to configuration or
> environment problems.  This failure appears to be an internal failure;
> here's some additional information (which may only be relevant to an
> Open MPI developer):
>  
>   orte_session_dir failed
>   --> Returned value Error (-1) instead of ORTE_SUCCESS
> --------------------------------------------------------------------------
> [e8315:02942] [[8474,0],1] ORTE_ERROR_LOG: Error in file ess_env_module.c at 
> line 136
> [e8315:02942] [[8474,0],1] ORTE_ERROR_LOG: Error in file runtime/orte_init.c 
> at line 132
> --------------------------------------------------------------------------
> It looks like orte_init failed for some reason; your parallel process is
> likely to abort.  There are many reasons that a parallel process can
> fail during orte_init; some of which are due to configuration or
> environment problems.  This failure appears to be an internal failure;
> here's some additional information (which may only be relevant to an
> Open MPI developer):
>  
>   orte_ess_set_name failed
>   --> Returned value Error (-1) instead of ORTE_SUCCESS
> --------------------------------------------------------------------------
> [e8315:02942] [[8474,0],1] ORTE_ERROR_LOG: Error in file orted/orted_main.c 
> at line 325
>  
>  
>  
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to