Am 16.01.2009 um 23:06 schrieb Reuti:
Am 16.01.2009 um 22:20 schrieb Jeff Dusenberry:
Reuti wrote:
Am 15.01.2009 um 16:20 schrieb Jeff Dusenberry:
I'm trying to launch multiple xterms under OpenMPI 1.2.8 and the
SGE job scheduler for purposes of running a serial debugger.
I'm experiencing file-locking problems on the .Xauthority file.
I tried to fix this by asking for a delay between successive
launches, to reduce the chances of contention for the lock by:
~$ qrsh -pe mpi 4 -P CIS /share/apps/openmpi/bin/mpiexec --mca
pls_rsh_debug 1 --mca pls_rsh_delay 5 xterm
The 'pls_rsh_delay 5' parameter seems to have no effect. I
tried replacing 'pls_rsh_debug 1' with 'orte_debug 1', which
gave me additional debugging output, but didn't fix the file
locking problem.
Sometimes the above commands will work and I will get all 4
xterms, but more often I will get an error:
/usr/bin/X11/xauth: error in locking authority file /export/
home/duse/.Xauthority
followed by
X11 connection rejected because of wrong authentication.
xterm Xt error: Can't open display: localhost:11.0
and one or more of the xterms will fail to open.
Am I missing something? Is there another debug flag I need to
set? Any suggestions for a better way to do this would be
appreciated.
You are right that it's neither Open MPI's, nor SGE's fault, but
a race condition in the SSH startup. You defined SSH with X11
forwarding in SGE (qconf -mconf) - right? Then you have first a
ssh connection from your workstation to the login-machine. Then
from the login-machine to the node where the mpiexec runs. And
then one for each slave node (means an additonal one on the
machine where mpiexec is already executed).
Yes, that's all correct. Clearly not very efficient, but I
haven't had any luck getting xauth or xhost to work more directly.
Although it might be possible to give every started sshd an
unique .Xauthority file, it's not straight forward to implement
due to SGE's startup of the daemons and you would need a
sophisticated ~/.ssh/rc to create the files at different location
and use it in the forthcoming xterm.
Thanks, that helped a lot, but I still can't quite get it to
work. I do want the xterms to run mpi jobs.
Do you need the X11 forwarding then for your application, and xterm
was just an example?
I tried this sshrc script (modified from the sshd man page):
XAUTHORITY=/local/$USER/.Xauthority${SSH_TTY##*/}
export XAUTHORITY
if read proto cookie && [ -n "$DISPLAY" ]; then
if [ `echo $DISPLAY | cut -c1-10` = 'localhost:' ]; then
# X11UseLocalhost=yes
echo add unix:`echo $DISPLAY | cut -c11-` $proto
$cookie
else
# X11UseLocalhost=no
echo add $DISPLAY $proto $cookie
fi | xauth -q -
fi
Yes, but the created session also needs it. I mean: you login to a
node with the above script. Then in the shell you execute:
$ xauth list
and you will get the default ~/.Xauthoriry Also in the shell you
need to export the above variable to get the listing of the created
special Xauthority file from the correct location. You can add:
export XAUTHORITY=/local/$USER/.Xauthority${SSH_TTY##*/}
to .bascrc and .profile (for non-interactive [mpiexec] and
interactive use)
For the SGE SSH_TTY issue I mentioned it's no straight forward.
When the SSH starts there is nothing defined by SGE. You could try
to look in the process chain (whether it's running under SGE), but
it doesn't look nice. I look into another solution and let you
know, when I found something.
What might be used is something to send and accept environment
variables and use it instead of the SSH_TTY. I.e. in SGE's setup:
rsh_command /usr/bin/ssh -osendenv=rank
and in the sshd_config:
AcceptEnv rank
Now the enviroment rank must be set for each mpi process and it
should work.
-- Reuti
-- Reuti
and I am successful in creating a unique .Xauthority for each
process locally on each node when I log in via ssh directly.
Unfortunately, I do have to provide another definition of
XAUTHORITY somewhere in my startup scripts - the one above does
not get seen outside of the sshrc execution.
When I try to run this under qrsh/mpiexec, it acts as if it
doesn't have the SSH_TTY environment variable (is that due to
SGE?), and we're back to a race condition. Is there another
variable I can use in the sge/mpi context? I also don't
understand where I would define the XAUTHORITY variable when
running under mpiexec.
I'm not sure this is the best way to approach this - I was
originally hoping that the mpiexec call would have a way to
introduce a delay between successive launches but that doesn't
seem to be working either.
Jeff
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users