Tony -- My apologies for taking so long to answer. :-(
I was unfortunately unable to replicate your problem. I ran your source code across 32 machines connected by TCP with no problem: mpirun --hostfile ~/mpi/cdc -np 32 -mca btl tcp,self netbench 8 I tried this on two different clusters with the same results -- it didn't hang. :-( Can you try again with a recent nightly tarball, or the 1.1.1 beta tarball that has been posted? http://www.open-mpi.org/software/ompi/v1.1/ On 6/30/06 8:35 AM, "Tony Ladd" <l...@che.ufl.edu> wrote: > Jeff > > Thanks for the reply; I realize you guys must be really busy with the recent > release of openmpi. I tried 1.1 and I don't get error messages any more. But > the code now hangs; no error or exit. So I am not sure if this is the same > issue or something else. I am enclosing my source code. I compiled with icc > and linked against an icc compiled version of openmpi-1.1. > > My program is a set of network benchmarks (a crude kind of netpipe) that > checks typical message passing patterns in my application codes. > Typical output is: > > 32 CPU's: sync call time = 1003.0 time > rate (Mbytes/s) bandwidth (MBits/s) > loop buffers size XC XE GS MS XC > XE GS MS XC XE GS MS > 1 64 16384 2.48e-02 1.99e-02 1.21e+00 3.88e-02 4.23e+01 > 5.28e+01 8.65e-01 2.70e+01 1.08e+04 1.35e+04 4.43e+02 1.38e+04 > 2 64 16384 2.17e-02 2.09e-02 1.21e+00 4.10e-02 4.82e+01 > 5.02e+01 8.65e-01 2.56e+01 1.23e+04 1.29e+04 4.43e+02 1.31e+04 > 3 64 16384 2.20e-02 1.99e-02 1.01e+00 3.95e-02 4.77e+01 > 5.27e+01 1.04e+00 2.65e+01 1.22e+04 1.35e+04 5.33e+02 1.36e+04 > 4 64 16384 2.16e-02 1.96e-02 1.25e+00 4.00e-02 4.85e+01 > 5.36e+01 8.37e-01 2.62e+01 1.24e+04 1.37e+04 4.28e+02 1.34e+04 > 5 64 16384 2.25e-02 2.00e-02 1.25e+00 4.07e-02 4.66e+01 > 5.24e+01 8.39e-01 2.57e+01 1.19e+04 1.34e+04 4.30e+02 1.32e+04 > 6 64 16384 2.19e-02 1.99e-02 1.29e+00 4.05e-02 4.79e+01 > 5.28e+01 8.14e-01 2.59e+01 1.23e+04 1.35e+04 4.17e+02 1.33e+04 > 7 64 16384 2.19e-02 2.06e-02 1.25e+00 4.03e-02 4.79e+01 > 5.09e+01 8.38e-01 2.60e+01 1.23e+04 1.30e+04 4.29e+02 1.33e+04 > 8 64 16384 2.24e-02 2.06e-02 1.25e+00 4.01e-02 4.69e+01 > 5.09e+01 8.39e-01 2.62e+01 1.20e+04 1.30e+04 4.30e+02 1.34e+04 > 9 64 16384 4.29e-01 2.01e-02 6.35e-01 3.98e-02 2.45e+00 > 5.22e+01 1.65e+00 2.64e+01 6.26e+02 1.34e+04 8.46e+02 1.35e+04 > 10 64 16384 2.16e-02 2.06e-02 8.87e-01 4.00e-02 4.85e+01 > 5.09e+01 1.18e+00 2.62e+01 1.24e+04 1.30e+04 6.05e+02 1.34e+04 > > Time is total for all 64 buffers. Rate is one way across one link (# of > bytes/time). > 1) XC is a bidirectional ring exchange. Each processor sends to the right > and receives from the left > 2) XE is an edge exchange. Pairs of nodes exchange data, with each one > sending and receiving > 3) GS is the MPI_AllReduce > 4) MS is my version of MPI_AllReduce. It splits the vector into Np blocks > (Np is # of processors); each processor then acts as a head node for one > block. This uses the full bandwidth all the time, unlike AllReduce which > thins out as it gets to the top of the binary tree. On a 64 node Infiniband > system MS is about 5X faster than GS-in theory it would be 6X; ie log_2(64). > Here it is 25X-not sure why so much. But MS seems to be the cause of the > hangups with messages > 64K. I can run the other benchmarks OK,but this one > seems to hang for large messages. I think the problem is at least partly due > to the switch. All MS is doing is point to point communications, but > unfortunately it sometimes requires a high bandwidth between ASIC's. It > first it exchanges data between near neighbors in MPI_COMM_WORLD, but it > must progressively span wider gaps between nodes as it goes up the various > binary trees. After a while this requires extensive traffic between ASICS. > This seems to be a problem on both my HP 2724 and the Extreme Networks > Summit400t-48. I am currently working with Extreme to try to resolve the > switch issue. As I say; the code ran great on Infiniband, but I think those > switches have hardware flow control. Finally I checked the code again under > LAM and it ran OK. Slow, but no hangs. > > To run the code compile and type: > mpirun -np 32 -machinefile hosts src/netbench 8 > The 8 means 2^8 bytes (ie 256K). This was enough to hang every time on my > boxes. > > You can also edit the header file (header.h). MAX_LOOPS is how many times it > runs each test (currently 10); NUM_BUF is the number of buffers in each test > (must be more than number of processors), SYNC defines the global sync > frequency-every SYNC buffers. NUM_SYNC is the number of sequential barrier > calls it uses to determine the mean barrier call time. You can also switch > the verious tests on and off, which can be useful for debugging > > Tony > > ------------------------------- > Tony Ladd > Professor, Chemical Engineering > University of Florida > PO Box 116005 > Gainesville, FL 32611-6005 > > Tel: 352-392-6509 > FAX: 352-392-9513 > Email: tl...@che.ufl.edu > Web: http://ladd.che.ufl.edu > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres Server Virtualization Business Unit Cisco Systems