Re: [O-MPI users] Open-MPI all-to-all performance

Jeff Squyres Fri, 3 Feb 2006 02:23:53 -0500

Greetings Konstantin.

Many thanks for this report. Another user submitted almost the sameissue earlier today (poor performance of Open MPI 1.0.x collectives;see http://www.open-mpi.org/community/lists/users/2006/02/0558.php).


Let me provide an additional clarification on Galen's reply:

The collectives in Open MPI 1.0.x are known to be sub-optimal -- theyreturn correct results, but they are not optimized at all. This iswhat Galen meant by "If I use the basic collectives then things dofall apart with long messages, but this is expected". Thecollectives in the Open MPI 1.1.x series (i.e., our currentdevelopment trunk) provide *much* better performance.

Galen ran his tests using the "tuned" collective module in the 1.1.xseries -- these are the "better" collectives that I referred toabove. This "tuned" module does not exist in the 1.0.x series.

You can download a 1.1.x nightly snapshot -- including the new"tuned" module -- from here:


        http://www.open-mpi.org/nightly/trunk/

If you get the opportunity, could you re-try your application with a1.1 snapshot?



On Feb 2, 2006, at 6:10 PM, Konstantin Kudin wrote:

 Hi all,

 There seem to have been problems with the attachement. Here is the
report:

 I did some tests of Open-MPI version 1.0.2a4r8848. My motivation was
an extreme degradation of all-to-all MPI performance on 8 cpus (ran
like 1 cpu). At the same time, MPICH 1.2.7 on 8 cpus runs more like on
4 (not like 1 !!!).

 This was done using Skampi from :
http://liinwww.ira.uka.de/~skampi/skampi4.1.tar.gz

 The version 4.1 was used.

 The system is bunch of a dual Opterons connected by Gigabit.

 The MPI operation I am most interested in is all-to-all exchange.

 First of all, there seem to be some problems with the logarithmic
approach. Here is what I mean. In the following, first column is the
packet size, the next one is the average time (microseconds), then
goes standard deviation. The test was done on 8 cpus (4 dual nodes).

mpirun -np 8 -mca mpi_paffinity_alone 1 skampi41

#/*@inp2p_MPI_Send-MPI_Iprobe_Recv.ski*/
#Description of the MPI_Send-MPI_Iprobe_Recv measurement:
       0      74.3      1.3      8      74.3      1.3      8
      16      77.4      2.1      8      77.4      2.1      8       0.0
     0.0
      32     398.9    323.4    100     398.9    323.4    100       0.0
     0.0
      64      80.7      2.3      9      80.7      2.3      9       0.0
     0.0
      80      79.3      2.3     13      79.3      2.3     13       0.0
     0.0

mpirun -np 8 -mca mpi_paffinity_alone 1 -mca coll_basic_crossover 8

skampi41
#/*@inp2p_MPI_Send-MPI_Iprobe_Recv.ski*/
#Description of the MPI_Send-MPI_Iprobe_Recv measurement:
       0      76.7      2.1      8      76.7      2.1      8
      16      75.8      1.5      8      75.8      1.5      8       0.0
     0.0
      32      74.4      0.6      8      74.4      0.6      8       0.0
     0.0
      64      76.3      0.4      8      76.3      0.4      8       0.0
     0.0
      80      76.7      0.5      8      76.7      0.5      8       0.0
     0.0

 This anomalously large times for certain packet sizes (either 16 or
32) without increasing coll_basic_crossover to 8 show up for a whole
set of tests, so this is not a fluke.

 Next, the all-to-all thing. The short test included 64x4 byte
messages.
The long one had 16384x4 byte messages.

mpirun -np 8 -mca mpi_paffinity_alone 1  -mca coll_basic_crossover 8

skampi41
#/*@insyncol_MPI_Alltoall-nodes-short-SM.ski*/
       2      12.7      0.2      8      12.7      0.2      8
       3      56.1      0.3      8      56.1      0.3      8
       4      69.9      1.8      8      69.9      1.8      8
       5      87.0      2.2      8      87.0      2.2      8
       6      99.7      1.5      8      99.7      1.5      8
       7     122.5      2.2      8     122.5      2.2      8
       8     147.5      2.5      8     147.5      2.5      8

#/*@insyncol_MPI_Alltoall-nodes-long-SM.ski*/
       2     188.5      0.3      8     188.5      0.3      8
       3    1680.5     16.6      8    1680.5     16.6      8
       4    2759.0     15.5      8    2759.0     15.5      8
       5    4110.2     34.0      8    4110.2     34.0      8
       6   75443.5  44383.9      6   75443.5  44383.9      6
       7  242133.4    870.5      2  242133.4    870.5      2
       8  252436.7   4016.8      8  252436.7   4016.8      8

mpirun -np 8 -mca mpi_paffinity_alone 1  -mca coll_basic_crossover 8

\
-mca coll_sm_info_num_procs 8 -mca btl_tcp_sndbuf 8388608 -mca
btl_tcp_rcvbuf 8388608 skampi41
#/*@insyncol_MPI_Alltoall-nodes-short-SM.ski*/
       2      13.1      0.1      8      13.1      0.1      8
       3      57.4      0.3      8      57.4      0.3      8
       4      73.7      1.6      8      73.7      1.6      8
       5      87.1      2.0      8      87.1      2.0      8
       6     103.7      2.0      8     103.7      2.0      8
       7     118.3      2.4      8     118.3      2.4      8
       8     146.7      3.1      8     146.7      3.1      8

#/*@insyncol_MPI_Alltoall-nodes-long-SM.ski*/
       2     185.8      0.6      8     185.8      0.6      8
       3    1760.4     17.3      8    1760.4     17.3      8
       4    2916.8     52.1      8    2916.8     52.1      8
       5  106993.4 102562.4      2  106993.4 102562.4      2
       6  260723.1   6679.1      2  260723.1   6679.1      2
       7  240225.2   6369.8      6  240225.2   6369.8      6
       8  250848.1   4863.2      6  250848.1   4863.2      6

mpirun -np 8 -mca mpi_paffinity_alone 1  -mca coll_basic_crossover 8

\
-mca coll_sm_info_num_procs 8 -mca btl_tcp_sndbuf 8388608 \
-mca btl_tcp_rcvbuf 8388608 -mca btl_tcp_min_send_size 32768 \
-mca btl_tcp_max_send_size 65536 skampi41
#/*@insyncol_MPI_Alltoall-nodes-short-SM.ski*/
       2      13.5      0.2      8      13.5      0.2      8
       3      57.3      1.8      8      57.3      1.8      8
       4      68.8      0.5      8      68.8      0.5      8
       5      83.2      0.6      8      83.2      0.6      8
       6     102.9      1.8      8     102.9      1.8      8
       7     117.4      2.3      8     117.4      2.3      8
       8     149.3      2.1      8     149.3      2.1      8

#/*@insyncol_MPI_Alltoall-nodes-long-SM.ski*/
       2     187.5      0.5      8     187.5      0.5      8
       3    1661.1     33.4      8    1661.1     33.4      8
       4    2715.9      6.9      8    2715.9      6.9      8
       5  116805.2  43036.4      8  116805.2  43036.4      8
       6  163177.7  41363.4      7  163177.7  41363.4      7
       7  233105.5  20621.4      2  233105.5  20621.4      2
       8  332049.5  83860.5      2  332049.5  83860.5      2


Same stuff for MPICH 1.2.7 (sockets, no shared memory):
#/*@insyncol_MPI_Alltoall-nodes-short-SM.ski*/
       2     312.5    106.5    100     312.5    106.5    100
       3     546.9    136.2    100     546.9    136.2    100
       4    2929.7    195.3    100    2929.7    195.3    100
       5    2070.3    203.7    100    2070.3    203.7    100
       6    2929.7    170.0    100    2929.7    170.0    100
       7    1328.1    186.0    100    1328.1    186.0    100
       8    3203.1    244.4    100    3203.1    244.4    100

#/*@insyncol_MPI_Alltoall-nodes-short-SM.ski*/
       2     390.6    117.8    100     390.6    117.8    100
       3    3164.1    252.6    100    3164.1    252.6    100
       4    5859.4    196.3    100    5859.4    196.3    100
       5   15234.4   6895.1     30   15234.4   6895.1     30
       6   18136.2   5563.7     14   18136.2   5563.7     14
       7   14204.5   2898.0     11   14204.5   2898.0     11
       8   11718.8   1594.7      4   11718.8   1594.7      4

So, as one can see, MPICH latencies are much higher for small packets,
yet, things are way more consistent for larger ones. Depending on the
settings, Open-MPI either degrades at 5 or 6 cpus.

 Konstantin




__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
{+} Jeff Squyres
{+} The Open MPI Project
{+} http://www.open-mpi.org/

Re: [O-MPI users] Open-MPI all-to-all performance

Reply via email to