On Thu, Aug 19, 2010 at 9:03 PM, Rahul Nabar <rpna...@gmail.com> wrote:
> ------------------------------------------------------------------
> gather:
>    NP256    hangs
>    NP128    hangs
>    NP64    hangs
>    NP32    OK
>
> Note: "gather" always hangs at the following line of the test:
>       #bytes #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]
> [snip]
>         4096         1000       525.80       527.69       526.79
> ------------------------------------------------------------------

What I thought was a permanent "hang" for the NP64 "gather" test, was,
in fact, an exceedingly long stall. After waiting for more than 7
minutes the test runs forward to completion.  What is surprising is
the _huge_ jump in times from the 4096 to 8192 byte packet sizes. Its
a step change from 275 to 1380 usecs.  Any ideas what could cause this
and if this could be related to the other "hangs" I am seeing? We are
using jumbo frames with a MTU:9000 so that was one thought I had for
this transition.

On the other hand, this doesn't seem to be the case with the "hang"
for the NP256 bcast test. That one stayed hung for more than an hour
at which point I did kill it.

Just to make sure this wasn't just some quirk or buggy implementation
in the Intel-IMB test suite are there any alternative testing suites
that I could  run on my cluster? I was a bit iffy about the "Intel-IMB
test suite" because I have found no active forums or mailing lists
that focus on this suite so can't really get in touch with any users
nor developers that might have an insight into how these benchmarks
run.

7m22.972s
# /opt/src/mpitests/imb/src/IMB-MPI1 -npmin 64 gather

# Minimum message length in bytes:   0
# Maximum message length in bytes:   4194304
#
# MPI_Datatype                   :   MPI_BYTE
# MPI_Datatype for reductions    :   MPI_FLOAT
# MPI_Op                         :   MPI_SUM
#
#

# List of Benchmarks to run:

# Gather

#----------------------------------------------------------------
# Benchmarking Gather
# #processes = 64
#----------------------------------------------------------------
       #bytes #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]
            0         1000         0.02         0.03         0.02
            1         1000        68.72        68.95        68.84
            2         1000        69.16        69.39        69.28
            4         1000        68.85        69.08        68.97
            8         1000        69.02        69.25        69.14
           16         1000        70.29        70.51        70.40
           32         1000        72.14        72.38        72.27
           64         1000        70.99        71.24        71.12
          128         1000        72.59        72.84        72.72
          256         1000        76.00        76.26        76.14
          512         1000        84.92        85.21        85.06
         1024         1000       101.69       102.01       101.84
         2048         1000       146.94       147.41       147.18
         4096         1000       275.61       276.45       276.04
         8192           13      1380.54      1607.84      1522.64
        16384           13      1497.09      1749.46      1656.61
        32768           13      2055.61      2380.37      2259.50
        65536           13      4553.46      5002.70      4837.14
       131072           13      7720.76      8926.69      8483.07
       262144           13     10423.99     12027.23     11440.07
       524288           13     19456.94     22369.62     21317.78
      1048576           13     38228.53     43892.99     41880.94
      2097152           13     99705.55    119614.62    115667.49
      4194304           10    425823.38    496396.78    468326.45

Reply via email to