Re: [OMPI users] SM btl slows down bandwidth?

Daniël Mantione Wed, 13 Aug 2008 02:22:49 -0400


On Tue, 12 Aug 2008, Gus Correa wrote:


> Hello Daniel and list
> 
> Could it be a problem with memory bandwidth / contention in multi-core?

Yes, I believe we are somehow limited by memory performance. Here are 
some numbers from a dual Opteron 2352 system, which has much more memory 
bandwidth:

#---------------------------------------------------
# Benchmarking PingPong
# #processes = 2
# ( 6 additional processes waiting in MPI_Barrier)
#---------------------------------------------------
       #bytes #repetitions      t[usec]   Mbytes/sec
            0         1000         0.86         0.00
            1         1000         0.97         0.98
            2         1000         0.95         2.01
            4         1000         0.96         3.97
            8         1000         0.95         7.99
           16         1000         0.96        15.85
           32         1000         0.99        30.69
           64         1000         0.97        63.09
          128         1000         1.02       119.68
          256         1000         1.18       207.25
          512         1000         1.40       348.77
         1024         1000         1.75       556.75
         2048         1000         2.59       753.22
         4096         1000         5.10       766.23
         8192         1000         7.93       985.13
        16384         1000        14.60      1070.57
        32768         1000        27.92      1119.23
        65536          640        46.67      1339.16
       131072          320        86.03      1453.06
       262144          160       163.16      1532.21
       524288           80       310.01      1612.88
      1048576           40       730.62      1368.69
      2097152           20      1449.72      1379.57
      4194304           10      2884.90      1386.53

However, +/- 1200 MB/s (or +/ 1500 MB/s in case of the AMD system) is not 
even close to the memory performance limits the systems, so there 
should be room for optimization.

After all, the openib btl manages to tranfer the data from the memory of 
oneprocess to the memory of another process just fine with more 
performance.

> It has been reported in many mailing lists (mpich, beowulf, etc).
> Here it seems to happen in dual-processor dual-core with our memory intensive
> programs.

MPICH2 manages to get about 5GB/s in shared memory performance on the 
Xeon 5420 system.

> Have you checked what happens to the shared memory runs as you
> you increase the number of active cores/processes?
> Would it help to set the processor affinity in the shared memory runs?
>
> http://www.open-mpi.org/faq/?category=building#build-paffinity
> http://www.open-mpi.org/faq/?category=tuning#using-paffinity

Neither has any effect on the scores.

Daniël

Re: [OMPI users] SM btl slows down bandwidth?

Reply via email to