On Nov 3, 2011, at 8:54 AM, Blosch, Edwin L wrote: > Can anyone guess what the problem is here? I was under the impression that > OpenMPI (1.4.4) would look for /tmp and would create its shared-memory > backing file there, i.e. if you don’t set orte_tmpdir_base to anything.
That is correct > > Well, there IS a /tmp and yet it appears that OpenMPI has chosen to use > /dev/shm. Why? Looks like a bug to me - it shouldn't be doing that. Will have to take a look - first I've heard of that behavior. > > And, next question, why doesn’t it work? Here are the oddities of this > cluster: > - the cluster is ‘diskless’ > - /tmp is an NFS mount > - /dev/shm is 12 GB and has 755 permissions > > Filesystem Size Used Avail Use% Mounted on > tmpfs 12G 164K 12G 1% /dev/shm > > % ls –l output: > drwxr-xr-x 2 root root 40 Oct 28 09:14 shm > > > The error message below suggests that OpenMPI (1.4.4) has somehow > auto-magically decided to use /dev/shm and is failing to be able to us e it, > for some reason. > > Thanks for whatever help you can offer, > > Ed > > > e8315:02942] opal_os_dirpath_create: Error: Unable to create the > sub-directory (/dev/shm/openmpi-sessions-estenfte@e8315_0) of > (/dev/shm/openmpi-sessions-estenfte@e8315_0/8474/0/1), mkdir failed [1] > [e8315:02942] [[8474,0],1] ORTE_ERROR_LOG: Error in file util/session_dir.c > at line 106 > [e8315:02942] [[8474,0],1] ORTE_ERROR_LOG: Error in file util/session_dir.c > at line 399 > [e8315:02942] [[8474,0],1] ORTE_ERROR_LOG: Error in file > base/ess_base_std_orted.c at line 206 > -------------------------------------------------------------------------- > It looks like orte_init failed for some reason; your parallel process is > > likely to abort. There are many reasons that a parallel process can > fail during orte_init; some of which are due to configuration or > environment problems. This failure appears to be an internal failure; > here's some additional information (which may only be relevant to an > Open MPI developer): > > orte_session_dir failed > --> Returned value Error (-1) instead of ORTE_SUCCESS > -------------------------------------------------------------------------- > [e8315:02942] [[8474,0],1] ORTE_ERROR_LOG: Error in file ess_env_module.c at > line 136 > [e8315:02942] [[8474,0],1] ORTE_ERROR_LOG: Error in file runtime/orte_init.c > at line 132 > -------------------------------------------------------------------------- > It looks like orte_init failed for some reason; your parallel process is > likely to abort. There are many reasons that a parallel process can > fail during orte_init; some of which are due to configuration or > environment problems. This failure appears to be an internal failure; > here's some additional information (which may only be relevant to an > Open MPI developer): > > orte_ess_set_name failed > --> Returned value Error (-1) instead of ORTE_SUCCESS > -------------------------------------------------------------------------- > [e8315:02942] [[8474,0],1] ORTE_ERROR_LOG: Error in file orted/orted_main.c > at line 325 > > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users