Dear Jeff and Galen,

 I have tried openmpi-1.1a1r8890. The good news is that it seems like
the freaky long latencies for certain packet sizes went away with the
options they showed up with before. Also, one version of all-to-all
appears to behave nicer with a specified set of parameters. However, I
still get only 1cpu performance out of 8 with the actual application,
and all this time is spent doing parallel FFTs. What is interesting is
that even with the tuned parameters, the other version of all-to-all
still performs quite poorly (see below).

#/*@insyncol_MPI_Alltoall-nodes-long-SM.ski*/
mpirun -np 8 -mca btl tcp -mca coll self,basic,tuned -mca  \
mpi_paffinity_alone 1 skampi41
       2     272.1      3.7      8     272.1      3.7      8
       3    1800.5     72.9      8    1800.5     72.9      8
       4    3074.0     61.0      8    3074.0     61.0      8
       5    5705.5    102.0      8    5705.5    102.0      8
       6    8054.2    282.3      8    8054.2    282.3      8
       7    9462.9    104.2      8    9462.9    104.2      8
       8   11245.8     66.9      8   11245.8     66.9      8

mpirun -np 8 -mca btl tcp -mca coll self,basic,tuned -mca  \
mpi_paffinity_alone 1  -mca coll_basic_crossover 8 skampi41
       2     267.7      1.5      8     267.7      1.5      8
       3    1591.2      8.4      8    1591.2      8.4      8
       4    2704.4     17.1      8    2704.4     17.1      8
       5    4813.7    307.9      3    4813.7    307.9      3
       6    5329.1     57.0      2    5329.1     57.0      2
       7  198767.6  49076.2      5  198767.6  49076.2      5
       8  254832.6  11235.3      5  254832.6  11235.3      5


 Still poor performance:

#/*@insyncol_MPI_Alltoall_Isend_Irecv-nodes-long-SM.ski*/
       2     235.0      0.7      8     235.0      0.7      8
       3    1565.6     15.3      8    1565.6     15.3      8
       4    2694.8     24.3      8    2694.8     24.3      8
       5   11389.9   6971.9      6   11389.9   6971.9      6
       6  249612.0  21102.1      2  249612.0  21102.1      2
       7  239051.9   3915.0      2  239051.9   3915.0      2
       8  262356.5  12324.6      2  262356.5  12324.6      2


 Kostya




--- Jeff Squyres <jsquy...@open-mpi.org> wrote:

> Greetings Konstantin.
> 
> Many thanks for this report.  Another user submitted almost the same 
> 
> issue earlier today (poor performance of Open MPI 1.0.x collectives; 
> 
> see http://www.open-mpi.org/community/lists/users/2006/02/0558.php).
> 
> Let me provide an additional clarification on Galen's reply:
> 
> The collectives in Open MPI 1.0.x are known to be sub-optimal -- they
>  
> return correct results, but they are not optimized at all.  This is  
> what Galen meant by "If I use the basic collectives then things do  
> fall apart with long messages, but this is expected".  The  
> collectives in the Open MPI 1.1.x series (i.e., our current  
> development trunk) provide *much* better performance.
> 
> Galen ran his tests using the "tuned" collective module in the 1.1.x 
> 
> series -- these are the "better" collectives that I referred to  
> above.  This "tuned" module does not exist in the 1.0.x series.
> 
> You can download a 1.1.x nightly snapshot -- including the new  
> "tuned" module -- from here:
> 
>       http://www.open-mpi.org/nightly/trunk/
> 
> If you get the opportunity, could you re-try your application with a 
> 
> 1.1 snapshot?


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

Reply via email to