Hello,
I would really appreciate any advice on troubleshooting/tuning Open MPI over 
ConnectX. More details about our setup can be found here 
http://www.cse.scitech.ac.uk/disco/database/search-machine.php?MID=52 Single 
process per node (ppn=1) seems to be fine (the results for IMB can be found 
here http://www.cse.scitech.ac.uk/disco/database/search-pmb.php) However there 
is a problem with Alltoall and ppn=8
mpiexec --mca btl ^tcp  -machinefile hosts32x8.txt -n 128 src/IMB-MPI1.openmpi 
-npmin 128 Alltoall
       #bytes #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]
            0         1000         0.01         0.02         0.01
            1         1000        95.70        95.87        95.81
            2         1000       107.59       107.64       107.62
            4         1000       108.46       108.52       108.49
            8         1000       112.25       112.30       112.28
           16         1000       121.07       121.12       121.10
           32         1000       154.12       154.18       154.15
           64         1000       207.85       207.93       207.89
          128         1000       334.52       334.63       334.58
          256         1000      9303.66      9305.98      9304.99
          512         1000      8953.59      8955.71      8955.08
         1024         1000      8607.87      8608.78      8608.42
         2048         1000      8642.59      8643.30      8643.03
         4096         1000      8478.45      8478.64      8478.58

I’ve tried playing with various parameters but to no avail. The step up for the 
same message size is noticeable for n=64 and 32 as well but progressively less 
so. Even more surprising is the fact that Gigabit performs better for this 
message size.
mpiexec --mca btl self,sm,tcp --mca btl_tcp_if_include eth1 -machinefile 
hosts32x8.txt -n 128 src/IMB-MPI1.openmpi -npmin 128 Alltoall
            8         1000       598.66       599.11       598.95
           16         1000       723.07       723.48       723.29
           32         1000      1144.79      1145.46      1145.18
           64         1000      1850.25      1850.97      1850.66
          128         1000      3794.32      3795.23      3794.82
          256         1000      5653.55      5653.97      5653.81
          512         1000      7107.96      7109.90      7109.66
         1024         1000     10310.53     10315.90     10315.63
         2048         1000    350066.92    350152.90    350091.89
         4096         1000     42238.60     42239.53     42239.27
         8192         1000    112781.11    112782.55    112782.10
        16384         1000   2450606.75   2450625.01   2450617.86
Unfortunately this task never completes…

Thanks in advance. Sorry for the long post.
Igor 

PS  I’m following the discussion on slow sm btl but not sure if this particular 
problem is related or not. BTW the Open MPI build I’m using is for Intel 
compiler. 
PPS MVAPICH and MVAPICH2 behave much better but not perfect too. Unfortunately 
I have other problems with them.


I. Kozin  (i.kozin at dl.ac.uk)
STFC Daresbury Laboratory, WA4 4AD, UK
http://www.cse.clrc.ac.uk/disco 


Reply via email to