On Mar 13, 2009, at 6:47 AM, Ricardo Fernández-Perea wrote:
In the same machine the same job takes a lot more time while using XGrid than while using any other method even all the orted run in the same node when using Xgrid it use tcp instead of sm is that expected or do I have a problem.
This is unfortunately a known issue. Because XGrid doesn't give any way of knowing where to launch until the processes are already started, and doesn't handle wire-up, I had to fake a couple of things when I initially wrote the code. In particular, our run-time really wanted to know if two processes were on the same node *before* the launch (so that it would know if they could share a control daemon). That part is still a problem, although possibly solvable with changes in the run-time since I wrote that code.
If the world was perfect, I'd launch only the executables and skip the daemons. The problem with that model is that xgrid's stdio forwarding is a little different than what most users expect. It is (or was) nearly impossible to get "real time" stdio output from the processes without handling it all ourselves, which requires the previously mentioned, slightly evil, daemons.
All this leads up to the short answer to your question - it's expected that two processes on the same node with xgrid will use tcp instead of shared memory for communication. This could probably be fixed with some extra coding, but unfortunately I'm totally swamped on another project (and trying to finish my thesis), so it's unlikely I'll be able to look at it for a while.
Brian -- Brian Barrett Open MPI developer http://www.open-mpi.org/