2009/1/6 Ralph Castain <r...@lanl.gov> > > On Jan 5, 2009, at 5:19 PM, Jeff Squyres wrote: > > On Jan 5, 2009, at 5:01 PM, Maciej Kazulak wrote: >> >> Interesting though. I thought in such a simple scenario shared memory >>> would be used for IPC (or whatever's fastest) . But nope. Even with one >>> process still it wants to use TCP/IP to communicate between mpirun and >>> orted. >>> >> >> Correct -- we only have TCP enabled for MPI process <--> orted >> communication. There are several reasons why; the simplest is that this is >> our "out of band" channel and it is only used to setup and tear down the >> job. As such, we don't care that it's a little slower than other possible >> channels (such as sm). MPI traffic will use shmem, OpenFabrics-based >> networks, Myrinet, ...etc. But not MPI process <--> orted communication. >> >> What's even more surprising to me it won't use loopback for that. Hence >>> my maybe a little bit over-restrictive iptables rules were the problem. I >>> allowed traffic from 127.0.0.1 to 127.0.0.1 on lo but not from <eth0_addr> >>> to <eth0_addr> on eth0 and both processes ended up waiting for IO. >>> >>> Can I somehow configure it to use something other than TCP/IP here? Or at >>> least switch it to loopback? >>> >> >> I don't remember how it works in the v1.2 series offhand; I think it's >> different in the v1.3 series (where all MPI processes *only* talk to the >> local orted, vs. MPI processes making direct TCP connections back to mpirun >> and any other MPI process with which it needs to bootstrap other >> communication channels). I'm *guessing* that the MPI process <--> orted >> communication either uses a named unix socket or TCP loopback. Ralph -- can >> you explain the details? >> > > In the 1.2 series, mpirun spawns a local orted to handle all local procs. > The code that discovers local interfaces specifically ignores any interfaces > that are not up or are just local loopbacks. My guess is that the person who > wrote that code long, long ago was assuming that the sole purpose was to > talk to remote nodes, not to loop back onto yourself. > > I imagine it could be changed to include loopback, but I would first need > to work with other developers to ensure there are no unexpected consequences > in doing so. Since no TCP interface is found, mpirun fails. > > In the 1.3 series, mpirun handles the local procs itself. Thus, this issue > does not appear and things run just fine. > > > Ralph > > >> >> -- >> Jeff Squyres >> Cisco Systems >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
Thanks for the answer. Think i'll just update my firewall rules for now and wait for a 1.3 release.