Re: [OMPI users] x11 forwarding
what does your command line look like? - Galen On Nov 29, 2006, at 7:53 PM, Dave Grote wrote: I cannot get X11 forwarding to work using mpirun. I've tried all of the standard methods, such as setting pls_rsh_agent = ssh -X, using xhost, and a few other things, but nothing works in general. In the FAQ, http://www.open-mpi.org/faq/?category=running#mpirun-gui, a reference is made to other methods, but "they involve sophisticated X forwarding through mpirun", and no further explanation is given. Can someone tell me what these other methods are or point me to where I can find info on them? I've done lots of google searching and havn't found anything useful. This is a major issue since my parallel code heavily depends on having the ability to open X windows on the remote machine. Any and all help would be appreciated! Thanks! Dave ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] dual Gigabit ethernet support
Looking VERY briefly at the GAMMA API here: http://www.disi.unige.it/project/gamma/gamma_api.html It looks like one could create a GAMMA BTL with a minimal amount of trouble. I would encourage your group to do this! There is quite a bit of information regarding the BTL interface, and for GAMMA it looks like all you would need is the send/recv interfaces to start. You could do trickier things with the RDMA put/ get interfaces in an attempt to minimize memory copies (we do this with TCP) but this is not necessary for correctness. Anyway, here is the current list of docs that explain our P2P layers: Here is a paper on PML OB1, this is the upper layer above the BTL's you wouldn't need to worry much about this but good to know what we are doing...: http://www.open-mpi.org/papers/euro-pvmmpi-2006-hpc-protocols There is also some information in this paper this has information about the PML BTL interactions, from an IB point of view: http://www.open-mpi.org/papers/ipdps-2006 For a very detailed presentation on OB1 go here, this is probably the most relevant: http://www.open-mpi.org/papers/workshop-2006/wed_01_pt2pt.pdf Thanks, Galen On Oct 23, 2006, at 4:05 PM, Lisandro Dalcin wrote: On 10/23/06, Tony Ladd wrote: A couple of comments regarding issues raised by this thread. 1) In my opinion Netpipe is not such a great network benchmarking tool for HPC applications. It measures timings based on the completion of the send call on the transmitter not the completion of the receive. Thus, if there is a delay in copying the send buffer across the net, it will report a misleading timing compared with the wall-clock time. This is particularly problematic with multiple pairs of edge exchanges, which can oversubscribe most GigE switches. Here the netpipe timings can be off by orders of magnitude compared with the wall clock. The good thing about writing your own code is that you know what it has done (of course no one else knows, which can be a problem). But it seems many people are unaware of the timing issue in Netpipe. Yes! I've noticed that. I am now using Intel MPI Benchmarck. PingPong /PingPing and SendRecv test cases seems to be more realistic. Does any one have any comments about this test suite? 2) Its worth distinguishing between ethernet and TCP/IP. With MPIGAMMA, the Intel Pro 1000 NIC has a latency of 12 microsecs including the switch and a duplex bandwidth of 220 MBytes/sec. With the Extreme Networks X450a-48t switch we can sustain 220MBytes/sec over 48 ports at once. This is not IB performance but it seems sufficient to scale a number of applications to the 100 cpu level, and perhaps beyond. GAMMA seems to be a great work, looking at some of its reports in web site. Hoever, I have not tried it yet, and I am not sure if I will, mainly because only supports MPICH-1. Has anyone any rough idea how much work it could be to make it availabe for OpenMPI. Seems to be a very interesting student project... -- Lisandro Dalcín --- Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC) Instituto de Desarrollo Tecnológico para la Industria Química (INTEC) Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET) PTLC - Güemes 3450, (3000) Santa Fe, Argentina Tel/Fax: +54-(0)342-451.1594 ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] x11 forwarding
I'm using caos linux (developed at LBL), which has the wrapper wwmpirun around mpirun, so my command is something like wwmpirun -np 8 -- -x PYTHONPATH --mca pls_rsh_agent '"ssh -X"' /usr/local/bin/pyMPI This is essentially the same as mpirun -np 8 -x PYTHONPATH --mca pls_rsh_agent '"ssh -X"' /usr/local/bin/pyMPI but wwmpirun does the scheduling, for example looking for idle nodes and creating the host file. My system is setup with a master/login node which is running a full version of linux and slave nodes that run a reduced linux (that includes access to the X libraries). wwmmpirun always picks the slaves nodes to run on. I've also tried "ssh -Y" and it doesn't help. I've set xhost for the slave nodes in my login shell on the master and that didn't work. XForwarding is enabled on all of the nodes, so that's not the problem. I am able to get it to work by having wwmpirun do the command "ssh -X node xclock" before starting the parallel program on that same node, but this only works for the first person who logs into the master and gets DISPLAY=localhost:10. When someone else tries to run a parallel job, its seems that DISPLAY is set to localhost:10 on the slaves and tries to forward through that other persons login with the same display number and the connection is refused because of wrong authentication. This seems like very odd behavior. I'm aware that this may be an issue with the X server (xorg) or with the version of linux, so I am also seeking help from the person who maintains caos linux. If it matters, the machine uses myrinet for the interconnects. Thanks! Dave Galen Shipman wrote: what does your command line look like? - Galen On Nov 29, 2006, at 7:53 PM, Dave Grote wrote: I cannot get X11 forwarding to work using mpirun. I've tried all of the standard methods, such as setting pls_rsh_agent = ssh -X, using xhost, and a few other things, but nothing works in general. In the FAQ, http://www.open-mpi.org/faq/?category=running#mpirun-gui, a reference is made to other methods, but "they involve sophisticated X forwarding through mpirun", and no further explanation is given. Can someone tell me what these other methods are or point me to where I can find info on them? I've done lots of google searching and havn't found anything useful. This is a major issue since my parallel code heavily depends on having the ability to open X windows on the remote machine. Any and all help would be appreciated! Thanks! Dave ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] x11 forwarding
Actually, I believe at least some of this may be a bug on our part. We currently pickup the local environment and forward it on to the remote nodes as the environment for use by the backend processes. I have seen quite a few environment variables in that list, including DISPLAY, which would create the problem you are seeing. I¹ll have to chat with folks here to understand what part of the environment we absolutely need to carry forward, and what parts we need to ³cleanse² before passing it along. Ralph On 11/30/06 10:50 AM, "Dave Grote" wrote: > > I'm using caos linux (developed at LBL), which has the wrapper wwmpirun around > mpirun, so my command is something like > wwmpirun -np 8 -- -x PYTHONPATH --mca pls_rsh_agent '"ssh -X"' > /usr/local/bin/pyMPI > This is essentially the same as > mpirun -np 8 -x PYTHONPATH --mca pls_rsh_agent '"ssh -X"' /usr/local/bin/pyMPI > but wwmpirun does the scheduling, for example looking for idle nodes and > creating the host file. > My system is setup with a master/login node which is running a full version of > linux and slave nodes that run a reduced linux (that includes access to the X > libraries). wwmmpirun always picks the slaves nodes to run on. I've also tried > "ssh -Y" and it doesn't help. I've set xhost for the slave nodes in my login > shell on the master and that didn't work. XForwarding is enabled on all of the > nodes, so that's not the problem. > > I am able to get it to work by having wwmpirun do the command "ssh -X node > xclock" before starting the parallel program on that same node, but this only > works for the first person who logs into the master and gets > DISPLAY=localhost:10. When someone else tries to run a parallel job, its seems > that DISPLAY is set to localhost:10 on the slaves and tries to forward through > that other persons login with the same display number and the connection is > refused because of wrong authentication. This seems like very odd behavior. > I'm aware that this may be an issue with the X server (xorg) or with the > version of linux, so I am also seeking help from the person who maintains caos > linux. If it matters, the machine uses myrinet for the interconnects. > Thanks! > Dave > > Galen Shipman wrote: >> >> what does your command line look like? >> >> - Galen >> >> On Nov 29, 2006, at 7:53 PM, Dave Grote wrote: >> >> >> >>> >>> I cannot get X11 forwarding to work using mpirun. I've tried all of >>> the >>> standard methods, such as setting pls_rsh_agent = ssh -X, using xhost, >>> and a few other things, but nothing works in general. In the FAQ, >>> http://www.open-mpi.org/faq/?category=running#mpirun-gui, a >>> reference is >>> made to other methods, but "they involve sophisticated X forwarding >>> through mpirun", and no further explanation is given. Can someone tell >>> me what these other methods are or point me to where I can find >>> info on >>> them? I've done lots of google searching and havn't found anything >>> useful. This is a major issue since my parallel code heavily >>> depends on >>> having the ability to open X windows on the remote machine. Any and >>> all >>> help would be appreciated! >>> Thanks! >>> Dave >>> ___ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >>> >> >> >> ___ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] x11 forwarding
Title: Re: [OMPI users] x11 forwarding I don't think that that is the problem. As far as I can tell, the DISPLAY environment variable is being set properly on the slave (it will sometimes have a different value than in the shell where mpirun was executed). Dave Ralph H Castain wrote: Actually, I believe at least some of this may be a bug on our part. We currently pickup the local environment and forward it on to the remote nodes as the environment for use by the backend processes. I have seen quite a few environment variables in that list, including DISPLAY, which would create the problem you are seeing. I’ll have to chat with folks here to understand what part of the environment we absolutely need to carry forward, and what parts we need to “cleanse” before passing it along. Ralph On 11/30/06 10:50 AM, "Dave Grote" wrote: I'm using caos linux (developed at LBL), which has the wrapper wwmpirun around mpirun, so my command is something like wwmpirun -np 8 -- -x PYTHONPATH --mca pls_rsh_agent '"ssh -X"' /usr/local/bin/pyMPI This is essentially the same as mpirun -np 8 -x PYTHONPATH --mca pls_rsh_agent '"ssh -X"' /usr/local/bin/pyMPI but wwmpirun does the scheduling, for example looking for idle nodes and creating the host file. My system is setup with a master/login node which is running a full version of linux and slave nodes that run a reduced linux (that includes access to the X libraries). wwmmpirun always picks the slaves nodes to run on. I've also tried "ssh -Y" and it doesn't help. I've set xhost for the slave nodes in my login shell on the master and that didn't work. XForwarding is enabled on all of the nodes, so that's not the problem. I am able to get it to work by having wwmpirun do the command "ssh -X node xclock" before starting the parallel program on that same node, but this only works for the first person who logs into the master and gets DISPLAY=localhost:10. When someone else tries to run a parallel job, its seems that DISPLAY is set to localhost:10 on the slaves and tries to forward through that other persons login with the same display number and the connection is refused because of wrong authentication. This seems like very odd behavior. I'm aware that this may be an issue with the X server (xorg) or with the version of linux, so I am also seeking help from the person who maintains caos linux. If it matters, the machine uses myrinet for the interconnects. Thanks! Dave Galen Shipman wrote: what does your command line look like? - Galen On Nov 29, 2006, at 7:53 PM, Dave Grote wrote: I cannot get X11 forwarding to work using mpirun. I've tried all of the standard methods, such as setting pls_rsh_agent = ssh -X, using xhost, and a few other things, but nothing works in general. In the FAQ, http://www.open-mpi.org/faq/?category=running#mpirun-gui, a reference is made to other methods, but "they involve sophisticated X forwarding through mpirun", and no further explanation is given. Can someone tell me what these other methods are or point me to where I can find info on them? I've done lots of google searching and havn't found anything useful. This is a major issue since my parallel code heavily depends on having the ability to open X windows on the remote machine. Any and all help would be appreciated! Thanks! Dave ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] For Open MPI + BPROC users
Galen Shipman wrote: We have found a potential issue with BPROC that may effect Open MPI. Open MPI by default uses PTYs for I/O forwarding, if PTYs aren't setup on the compute nodes, Open MPI will revert to using pipes. Recently (today) we found a potential issue with PTYs and BPROC. A simple reader/writer using PTYs causes the writer to hang in uninterruptible sleep. The consistency of the process table from the head node and the back end nodes is also effected, that is "bps" shows no writer process, while "bpsh NODE ps aux" shows the writer process in uninterruptible sleep. Since Open MPI uses PTYs by default on BPROC this results in ORTED or MPI processes being orphaned on compute nodes. The workaround for this issue is to configure Open MPI with --disable-pty-support and rebuild. The mpirun manual says that standard input is redirected from /dev/null, and that standard output of remote nodes will be attached to the node that invoked mpirun. If this is all caused by some buglet with BPROC I/O forwarding, perhaps it would help of the slave nodes were invoked with the equivalent of "bpsh -N"? I wonder if some people see the problem and others don't depending on stdout (or its absence) from different applications?