Thanks for your help Ralph, I'll double check that.
As for the error message received, there might be some inconsistency:
"/opt/sge/tmp/25.1.smp8.q/openmpi-sessions-eg@charlie_0" is the parent
directory and
"/opt/sge/tmp/25.1.smp8.q/openmpi-sessions-eg@charlie_0/53199/0/0" is
the subdirectory... not the other way around.
Eloi
Ralph Castain wrote:
Creating a directory with such credentials sounds like a bug in SGE to
me...perhaps an SGE config issue?
Only thing you could do is tell OMPI to use some other directory as
the root for its session dir tree - check "mpirun -h", or ompi_info
for the required option.
But I would first check your SGE config as that just doesn't sound right.
On Nov 10, 2009, at 9:40 AM, Eloi Gaudry wrote:
Hi there,
I'm experiencing some issues using GE6.2U4 and OpenMPI-1.3.3 (with
gridengine compnent).
During any job submission, SGE creates a session directory in
$TMPDIR, named after the job id and the computing node name. This
session directory is created using nobody/nogroup credentials.
When using OpenMPI with tight-integration, opal creates different
subdirectories in this session directory. The issue I'm facing now is
that OpenMPI fails to create these subdirectories:
[charlie:03882] opal_os_dirpath_create: Error: Unable to create the
sub-directory
(/opt/sge/tmp/25.1.smp8.q/openmpi-sessions-eg@charlie_0) of
(/opt/sge/tmp/25.1.smp8.q/openmpi-sessions-eg@charlie_0
[charlie:03882] [[53199,0],0] ORTE_ERROR_LOG: Error in file
../../openmpi-1.3.3/orte/util/session_dir.c at line 101
[charlie:03882] [[53199,0],0] ORTE_ERROR_LOG: Error in file
../../openmpi-1.3.3/orte/util/session_dir.c at line 425
[charlie:03882] [[53199,0],0] ORTE_ERROR_LOG: Error in file
../../../../../openmpi-1.3.3/orte/mca/ess/hnp/ess_hnp_module.c at
line 273
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):
orte_session_dir failed
--> Returned value Error (-1) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
[charlie:03882] [[53199,0],0] ORTE_ERROR_LOG: Error in file
../../openmpi-1.3.3/orte/runtime/orte_init.c at line 132
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):
orte_ess_set_name failed
--> Returned value Error (-1) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
[charlie:03882] [[53199,0],0] ORTE_ERROR_LOG: Error in file
../../../../openmpi-1.3.3/orte/tools/orterun/orterun.c at line 473
This seems very likely related to the permissions set on $TMPDIR.
I'd like to know if someone might have experienced the same or a
similar issue and if any solution was found.
Thanks for your help,
Eloi
--
Eloi Gaudry
Free Field Technologies
Axis Park Louvain-la-Neuve
Rue Emile Francqui, 1
B-1435 Mont-Saint Guibert
BELGIUM
Company Phone: +32 10 487 959
Company Fax: +32 10 454 626
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
--
Eloi Gaudry
Free Field Technologies
Axis Park Louvain-la-Neuve
Rue Emile Francqui, 1
B-1435 Mont-Saint Guibert
BELGIUM
Company Phone: +32 10 487 959
Company Fax: +32 10 454 626