Re: [OMPI users] x11 forwarding

2006-11-30 Thread Galen Shipman

what does your command line look like?

- Galen

On Nov 29, 2006, at 7:53 PM, Dave Grote wrote:



I cannot get X11 forwarding to work using mpirun. I've tried all of  
the

standard methods, such as setting pls_rsh_agent = ssh -X, using xhost,
and a few other things, but nothing works in general. In the FAQ,
http://www.open-mpi.org/faq/?category=running#mpirun-gui, a  
reference is

made to other methods, but "they involve sophisticated X forwarding
through mpirun", and no further explanation is given. Can someone tell
me what these other methods are or point me to where I can find  
info on

them? I've done lots of google searching and havn't found anything
useful. This is a major issue since my parallel code heavily  
depends on
having the ability to open X windows on the remote machine. Any and  
all

help would be appreciated!
  Thanks!
 Dave
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] dual Gigabit ethernet support

2006-11-30 Thread Galen Shipman

Looking VERY briefly at the GAMMA API here:
http://www.disi.unige.it/project/gamma/gamma_api.html

It looks like one could create a GAMMA BTL with a minimal amount of  
trouble.

I would encourage your group to do this!

There is quite a bit of information regarding the BTL interface, and  
for GAMMA it looks like all you would need is the send/recv  
interfaces to start. You could do trickier things with the RDMA put/ 
get interfaces in an attempt to minimize memory copies (we do this  
with TCP) but this is not necessary for correctness. Anyway, here is  
the current list of docs that explain our P2P layers:


Here is a paper on PML OB1, this is the upper layer above the BTL's  
you wouldn't need to worry much about this but good to know what we  
are doing...:

http://www.open-mpi.org/papers/euro-pvmmpi-2006-hpc-protocols

There is also some information in this paper this has information  
about the PML BTL interactions, from an IB point of view:

http://www.open-mpi.org/papers/ipdps-2006

For a very detailed presentation on OB1 go here, this is probably the  
most relevant:

http://www.open-mpi.org/papers/workshop-2006/wed_01_pt2pt.pdf


Thanks,

Galen


On Oct 23, 2006, at 4:05 PM, Lisandro Dalcin wrote:


On 10/23/06, Tony Ladd  wrote:

A couple of comments regarding issues raised by this thread.

1) In my opinion Netpipe is not such a great network benchmarking  
tool for
HPC applications. It measures timings based on the completion of  
the send
call on the transmitter not the completion of the receive. Thus,  
if there is

a delay in copying the send buffer across the net, it will report a
misleading timing compared with the wall-clock time. This is  
particularly
problematic with multiple pairs of edge exchanges, which can  
oversubscribe

most GigE switches. Here the netpipe timings can be off by orders of
magnitude compared with the wall clock. The good thing about  
writing your
own code is that you know what it has done (of course no one else  
knows,
which can be a problem). But it seems many people are unaware of  
the timing

issue in Netpipe.


Yes! I've noticed that. I am now using Intel MPI Benchmarck. PingPong
/PingPing and SendRecv test cases seems to be more realistic. Does any
one have any comments about this test suite?


2) Its worth distinguishing between ethernet and TCP/IP. With  
MPIGAMMA, the
Intel Pro 1000 NIC has a latency of 12 microsecs including the  
switch and a
duplex bandwidth of 220 MBytes/sec. With the Extreme Networks  
X450a-48t
switch we can sustain 220MBytes/sec over 48 ports at once. This is  
not IB
performance but it seems sufficient to scale a number of  
applications to the

100 cpu level, and perhaps beyond.



GAMMA seems to be a great work, looking at some of its reports in web
site. Hoever, I have not tried it yet, and I am not sure if I will,
mainly because only supports MPICH-1. Has anyone any rough idea how
much work it could be to make it availabe for OpenMPI. Seems to be a
very interesting student project...

--
Lisandro Dalcín
---
Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC)
Instituto de Desarrollo Tecnológico para la Industria Química (INTEC)
Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)
PTLC - Güemes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users





Re: [OMPI users] x11 forwarding

2006-11-30 Thread Dave Grote





I'm using caos linux (developed at LBL), which has the wrapper wwmpirun
around mpirun, so my command is something like
wwmpirun -np 8 -- -x PYTHONPATH --mca pls_rsh_agent '"ssh -X"'
/usr/local/bin/pyMPI
This is essentially the same as
mpirun -np 8 -x PYTHONPATH --mca pls_rsh_agent '"ssh -X"'
/usr/local/bin/pyMPI
but wwmpirun does the scheduling, for example looking for idle nodes
and creating the host file.
My system is setup with a master/login node which is running a full
version of linux and slave nodes that run a reduced linux (that
includes access to the X libraries). wwmmpirun always picks the slaves
nodes to run on. I've also tried "ssh -Y" and it doesn't help. I've set
xhost for the slave nodes in my login shell on the master and that
didn't work. XForwarding is enabled on all of the nodes, so that's not
the problem.

I am able to get it to work by having wwmpirun do the command "ssh -X
node xclock" before starting the parallel program on that same
node, but this only works for the first person who logs into the master
and gets DISPLAY=localhost:10. When someone else tries to run a
parallel job, its seems that DISPLAY is set to localhost:10 on the
slaves and tries to forward through that other persons login with the
same display number and the connection is refused because of wrong
authentication. This seems like very odd behavior. I'm aware that this
may be an issue with the X server (xorg) or with the version of linux,
so I am also seeking help from the person who maintains caos linux. If
it matters, the machine uses myrinet for the interconnects.
  Thanks!
 Dave

Galen Shipman wrote:

  what does your command line look like?

- Galen

On Nov 29, 2006, at 7:53 PM, Dave Grote wrote:

  
  
I cannot get X11 forwarding to work using mpirun. I've tried all of  
the
standard methods, such as setting pls_rsh_agent = ssh -X, using xhost,
and a few other things, but nothing works in general. In the FAQ,
http://www.open-mpi.org/faq/?category=running#mpirun-gui, a  
reference is
made to other methods, but "they involve sophisticated X forwarding
through mpirun", and no further explanation is given. Can someone tell
me what these other methods are or point me to where I can find  
info on
them? I've done lots of google searching and havn't found anything
useful. This is a major issue since my parallel code heavily  
depends on
having the ability to open X windows on the remote machine. Any and  
all
help would be appreciated!
  Thanks!
 Dave
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

  
  
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

  





Re: [OMPI users] x11 forwarding

2006-11-30 Thread Ralph H Castain
Actually, I believe at least some of this may be a bug on our part. We
currently pickup the local environment and forward it on to the remote nodes
as the environment for use by the backend processes. I have seen quite a few
environment variables in that list, including DISPLAY, which would create
the problem you are seeing.

I¹ll have to chat with folks here to understand what part of the environment
we absolutely need to carry forward, and what parts we need to ³cleanse²
before passing it along.

Ralph


On 11/30/06 10:50 AM, "Dave Grote"  wrote:

> 
> I'm using caos linux (developed at LBL), which has the wrapper wwmpirun around
> mpirun, so my command is something like
> wwmpirun -np 8 -- -x PYTHONPATH --mca pls_rsh_agent '"ssh -X"'
> /usr/local/bin/pyMPI
> This is essentially the same as
> mpirun -np 8 -x PYTHONPATH --mca pls_rsh_agent '"ssh -X"' /usr/local/bin/pyMPI
> but wwmpirun does the scheduling, for example looking for idle nodes and
> creating the host file.
> My system is setup with a master/login node which is running a full version of
> linux and slave nodes that run a reduced linux (that includes access to the X
> libraries). wwmmpirun always picks the slaves nodes to run on. I've also tried
> "ssh -Y" and it doesn't help. I've set xhost for the slave nodes in my login
> shell on the master and that didn't work. XForwarding is enabled on all of the
> nodes, so that's not the problem.
> 
> I am able to get it to work by having wwmpirun do the command "ssh -X node
> xclock" before starting the parallel program on that same node, but this only
> works for the first person who logs into the master and gets
> DISPLAY=localhost:10. When someone else tries to run a parallel job, its seems
> that DISPLAY is set to localhost:10 on the slaves and tries to forward through
> that other persons login with the same display number and the connection is
> refused because of wrong authentication. This seems like very odd behavior.
> I'm aware that this may be an issue with the X server (xorg) or with the
> version of linux, so I am also seeking help from the person who maintains caos
> linux. If it matters, the machine uses myrinet for the interconnects.
>   Thanks!
>  Dave
> 
> Galen Shipman wrote:
>>  
>> what does your command line look like?
>> 
>> - Galen
>> 
>> On Nov 29, 2006, at 7:53 PM, Dave Grote wrote:
>> 
>>   
>>  
>>>  
>>> I cannot get X11 forwarding to work using mpirun. I've tried all of
>>> the
>>> standard methods, such as setting pls_rsh_agent = ssh -X, using xhost,
>>> and a few other things, but nothing works in general. In the FAQ,
>>> http://www.open-mpi.org/faq/?category=running#mpirun-gui, a
>>> reference is
>>> made to other methods, but "they involve sophisticated X forwarding
>>> through mpirun", and no further explanation is given. Can someone tell
>>> me what these other methods are or point me to where I can find
>>> info on
>>> them? I've done lots of google searching and havn't found anything
>>> useful. This is a major issue since my parallel code heavily
>>> depends on
>>> having the ability to open X windows on the remote machine. Any and
>>> all
>>> help would be appreciated!
>>>   Thanks!
>>>  Dave
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> 
>>>  
>>  
>> 
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>>   
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] x11 forwarding

2006-11-30 Thread Dave Grote
Title: Re: [OMPI users] x11 forwarding





I don't think that that is the problem. As far as I can tell, the
DISPLAY environment variable is being set properly on the slave (it
will sometimes have a different value than in the shell where mpirun
was executed).
  Dave

Ralph H Castain wrote:

  
  Actually,
I believe at least some of this may be a bug on our part. We currently
pickup the local environment and forward it on to the remote nodes as
the environment for use by the backend processes. I have seen quite a
few environment variables in that list, including DISPLAY, which would
create the problem you are seeing.
  
I’ll have to chat with folks here to understand what part of the
environment we absolutely need to carry forward, and what parts we need
to “cleanse” before passing it along.
  
Ralph
  
  
On 11/30/06 10:50 AM, "Dave Grote"  wrote:
  
  
  
I'm using caos linux (developed at LBL), which has the wrapper wwmpirun
around mpirun, so my command is something like
wwmpirun -np 8 -- -x PYTHONPATH --mca pls_rsh_agent '"ssh -X"'
/usr/local/bin/pyMPI
This is essentially the same as
mpirun -np 8 -x PYTHONPATH --mca pls_rsh_agent '"ssh -X"'
/usr/local/bin/pyMPI
but wwmpirun does the scheduling, for example looking for idle nodes
and creating the host file.
My system is setup with a master/login node which is running a full
version of linux and slave nodes that run a reduced linux (that
includes access to the X libraries). wwmmpirun always picks the slaves
nodes to run on. I've also tried "ssh -Y" and it doesn't help. I've set
xhost for the slave nodes in my login shell on the master and that
didn't work. XForwarding is enabled on all of the nodes, so that's not
the problem.

I am able to get it to work by having wwmpirun do the command "ssh -X
node xclock" before starting the parallel program on that same
node, but this only works for the first person who logs into the master
and gets DISPLAY=localhost:10. When someone else tries to run a
parallel job, its seems that DISPLAY is set to localhost:10 on the
slaves and tries to forward through that other persons login with the
same display number and the connection is refused because of wrong
authentication. This seems like very odd behavior. I'm aware that this
may be an issue with the X server (xorg) or with the version of linux,
so I am also seeking help from the person who maintains caos linux. If
it matters, the machine uses myrinet for the interconnects.
  Thanks!
 Dave

Galen Shipman wrote: 

 
what does your command line look like?
  
- Galen
  
On Nov 29, 2006, at 7:53 PM, Dave Grote wrote:
  
  
 
  
   
I cannot get X11 forwarding to work using mpirun. I've tried all of  
the
standard methods, such as setting pls_rsh_agent = ssh -X, using xhost,
and a few other things, but nothing works in general. In the FAQ,
http://www.open-mpi.org/faq/?category=running#mpirun-gui,
a  
reference is
made to other methods, but "they involve sophisticated X forwarding
through mpirun", and no further explanation is given. Can someone tell
me what these other methods are or point me to where I can find  
info on
them? I've done lots of google searching and havn't found anything
useful. This is a major issue since my parallel code heavily  
depends on
having the ability to open X windows on the remote machine. Any and  
all
help would be appreciated!
  Thanks!
 Dave
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

 

   
  
___
users mailing list
us...@open-mpi.org
  http://www.open-mpi.org/mailman/listinfo.cgi/users
  
  
  

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

  
  
  

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users





Re: [OMPI users] For Open MPI + BPROC users

2006-11-30 Thread Marcus G. Daniels

Galen Shipman wrote:

We have found a potential issue with BPROC that may effect Open MPI.
Open MPI by default uses PTYs for I/O forwarding, if PTYs aren't  
setup on the compute nodes, Open MPI will revert to using pipes.  
Recently (today) we found a potential issue with PTYs and BPROC. A  
simple reader/writer using PTYs causes the writer to hang in  
uninterruptible sleep. The consistency of the process table from the  
head node and the back end nodes is also effected, that is "bps"  
shows no writer process, while "bpsh NODE ps aux" shows the writer  
process in uninterruptible sleep.


Since Open MPI uses PTYs by default on BPROC this results in ORTED or  
MPI processes being orphaned on compute nodes. The workaround for  
this issue is to configure Open MPI with --disable-pty-support and  
rebuild.
  
The mpirun manual says that standard input is redirected from /dev/null, 
and that standard output of remote nodes will be attached to the node 
that invoked mpirun.   If this is all caused by some buglet with BPROC 
I/O forwarding, perhaps it would help of the slave nodes were invoked 
with the equivalent of "bpsh -N"?  I wonder if some people see the 
problem and others don't depending on stdout (or its absence) from 
different applications?