Hi,
Am 07.03.2010 um 10:55 schrieb Gijsbert Wiesenekker:
I was having non-reproducible hangs in an OpenMPI program. While
troubleshooting this problem I found that there were many temporary
directories in my /tmp/openmpi-sessions-userid directory (probably
the result of MPI_Abort aborted OpenMPI programs). I cleaned those
directories up and it looks like the hangs have gone.
My questions are:
It looks like the name of the temporary directory in /tmp/openmpi-
sessions-userid directory is a process-id. What happens when an
OpenMPI program starts and the temporary directory in /tmp/openmpi-
sessions-userid already exists?
Could existing temporary directories in /tmp/openmpi-sessions-userid
cause an OpenMPI program to hang?
Is there a way to ensure that the temporary directory created in /
tmp/openmpi-sessions-userid is always removed after an OpenMPI
program has run?
if you use a queuingsystem like SGE it will go automatically into the
set $TMPDIR of the job which the queuingsystem also removes after the
job. Maybe you can create a $TMPDIR on your own and supply it to the
ssh call, so that it's known and used on all nodes.
-- Reuti