On Thu, Aug 19, 2010 at 9:03 PM, Rahul Nabar <rpna...@gmail.com> wrote: > ------------------------------------------------------------------ > gather: > NP256 hangs > NP128 hangs > NP64 hangs > NP32 OK > > Note: "gather" always hangs at the following line of the test: > #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] > [snip] > 4096 1000 525.80 527.69 526.79 > ------------------------------------------------------------------
What I thought was a permanent "hang" for the NP64 "gather" test, was, in fact, an exceedingly long stall. After waiting for more than 7 minutes the test runs forward to completion. What is surprising is the _huge_ jump in times from the 4096 to 8192 byte packet sizes. Its a step change from 275 to 1380 usecs. Any ideas what could cause this and if this could be related to the other "hangs" I am seeing? We are using jumbo frames with a MTU:9000 so that was one thought I had for this transition. On the other hand, this doesn't seem to be the case with the "hang" for the NP256 bcast test. That one stayed hung for more than an hour at which point I did kill it. Just to make sure this wasn't just some quirk or buggy implementation in the Intel-IMB test suite are there any alternative testing suites that I could run on my cluster? I was a bit iffy about the "Intel-IMB test suite" because I have found no active forums or mailing lists that focus on this suite so can't really get in touch with any users nor developers that might have an insight into how these benchmarks run. 7m22.972s # /opt/src/mpitests/imb/src/IMB-MPI1 -npmin 64 gather # Minimum message length in bytes: 0 # Maximum message length in bytes: 4194304 # # MPI_Datatype : MPI_BYTE # MPI_Datatype for reductions : MPI_FLOAT # MPI_Op : MPI_SUM # # # List of Benchmarks to run: # Gather #---------------------------------------------------------------- # Benchmarking Gather # #processes = 64 #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.02 0.03 0.02 1 1000 68.72 68.95 68.84 2 1000 69.16 69.39 69.28 4 1000 68.85 69.08 68.97 8 1000 69.02 69.25 69.14 16 1000 70.29 70.51 70.40 32 1000 72.14 72.38 72.27 64 1000 70.99 71.24 71.12 128 1000 72.59 72.84 72.72 256 1000 76.00 76.26 76.14 512 1000 84.92 85.21 85.06 1024 1000 101.69 102.01 101.84 2048 1000 146.94 147.41 147.18 4096 1000 275.61 276.45 276.04 8192 13 1380.54 1607.84 1522.64 16384 13 1497.09 1749.46 1656.61 32768 13 2055.61 2380.37 2259.50 65536 13 4553.46 5002.70 4837.14 131072 13 7720.76 8926.69 8483.07 262144 13 10423.99 12027.23 11440.07 524288 13 19456.94 22369.62 21317.78 1048576 13 38228.53 43892.99 41880.94 2097152 13 99705.55 119614.62 115667.49 4194304 10 425823.38 496396.78 468326.45