Jeff Thanks for the reply; I realize you guys must be really busy with the recent release of openmpi. I tried 1.1 and I don't get error messages any more. But the code now hangs; no error or exit. So I am not sure if this is the same issue or something else. I am enclosing my source code. I compiled with icc and linked against an icc compiled version of openmpi-1.1.
My program is a set of network benchmarks (a crude kind of netpipe) that checks typical message passing patterns in my application codes. Typical output is: 32 CPU's: sync call time = 1003.0 time rate (Mbytes/s) bandwidth (MBits/s) loop buffers size XC XE GS MS XC XE GS MS XC XE GS MS 1 64 16384 2.48e-02 1.99e-02 1.21e+00 3.88e-02 4.23e+01 5.28e+01 8.65e-01 2.70e+01 1.08e+04 1.35e+04 4.43e+02 1.38e+04 2 64 16384 2.17e-02 2.09e-02 1.21e+00 4.10e-02 4.82e+01 5.02e+01 8.65e-01 2.56e+01 1.23e+04 1.29e+04 4.43e+02 1.31e+04 3 64 16384 2.20e-02 1.99e-02 1.01e+00 3.95e-02 4.77e+01 5.27e+01 1.04e+00 2.65e+01 1.22e+04 1.35e+04 5.33e+02 1.36e+04 4 64 16384 2.16e-02 1.96e-02 1.25e+00 4.00e-02 4.85e+01 5.36e+01 8.37e-01 2.62e+01 1.24e+04 1.37e+04 4.28e+02 1.34e+04 5 64 16384 2.25e-02 2.00e-02 1.25e+00 4.07e-02 4.66e+01 5.24e+01 8.39e-01 2.57e+01 1.19e+04 1.34e+04 4.30e+02 1.32e+04 6 64 16384 2.19e-02 1.99e-02 1.29e+00 4.05e-02 4.79e+01 5.28e+01 8.14e-01 2.59e+01 1.23e+04 1.35e+04 4.17e+02 1.33e+04 7 64 16384 2.19e-02 2.06e-02 1.25e+00 4.03e-02 4.79e+01 5.09e+01 8.38e-01 2.60e+01 1.23e+04 1.30e+04 4.29e+02 1.33e+04 8 64 16384 2.24e-02 2.06e-02 1.25e+00 4.01e-02 4.69e+01 5.09e+01 8.39e-01 2.62e+01 1.20e+04 1.30e+04 4.30e+02 1.34e+04 9 64 16384 4.29e-01 2.01e-02 6.35e-01 3.98e-02 2.45e+00 5.22e+01 1.65e+00 2.64e+01 6.26e+02 1.34e+04 8.46e+02 1.35e+04 10 64 16384 2.16e-02 2.06e-02 8.87e-01 4.00e-02 4.85e+01 5.09e+01 1.18e+00 2.62e+01 1.24e+04 1.30e+04 6.05e+02 1.34e+04 Time is total for all 64 buffers. Rate is one way across one link (# of bytes/time). 1) XC is a bidirectional ring exchange. Each processor sends to the right and receives from the left 2) XE is an edge exchange. Pairs of nodes exchange data, with each one sending and receiving 3) GS is the MPI_AllReduce 4) MS is my version of MPI_AllReduce. It splits the vector into Np blocks (Np is # of processors); each processor then acts as a head node for one block. This uses the full bandwidth all the time, unlike AllReduce which thins out as it gets to the top of the binary tree. On a 64 node Infiniband system MS is about 5X faster than GS-in theory it would be 6X; ie log_2(64). Here it is 25X-not sure why so much. But MS seems to be the cause of the hangups with messages > 64K. I can run the other benchmarks OK,but this one seems to hang for large messages. I think the problem is at least partly due to the switch. All MS is doing is point to point communications, but unfortunately it sometimes requires a high bandwidth between ASIC's. It first it exchanges data between near neighbors in MPI_COMM_WORLD, but it must progressively span wider gaps between nodes as it goes up the various binary trees. After a while this requires extensive traffic between ASICS. This seems to be a problem on both my HP 2724 and the Extreme Networks Summit400t-48. I am currently working with Extreme to try to resolve the switch issue. As I say; the code ran great on Infiniband, but I think those switches have hardware flow control. Finally I checked the code again under LAM and it ran OK. Slow, but no hangs. To run the code compile and type: mpirun -np 32 -machinefile hosts src/netbench 8 The 8 means 2^8 bytes (ie 256K). This was enough to hang every time on my boxes. You can also edit the header file (header.h). MAX_LOOPS is how many times it runs each test (currently 10); NUM_BUF is the number of buffers in each test (must be more than number of processors), SYNC defines the global sync frequency-every SYNC buffers. NUM_SYNC is the number of sequential barrier calls it uses to determine the mean barrier call time. You can also switch the verious tests on and off, which can be useful for debugging Tony ------------------------------- Tony Ladd Professor, Chemical Engineering University of Florida PO Box 116005 Gainesville, FL 32611-6005 Tel: 352-392-6509 FAX: 352-392-9513 Email: tl...@che.ufl.edu Web: http://ladd.che.ufl.edu
src.tgz
Description: application/compressed