Dear Galen,

 It actually turns out that there is a problem not only with
MPI_Alltoall_Isend_Irecv, but also with another related operation
insyncol_MPI_Alltoallv-nodes-long-SM.ski (this what seems to be holding
down the FFTs, I checked the source code, and it uses alltoallv):

#/*@insyncol_MPI_Alltoallv-nodes-long-SM.ski*/
       2     250.8      1.0      8     250.8      1.0      8
       3    1779.6     27.0      8    1779.6     27.0      8
       4    2975.1     45.8      8    2975.1     45.8      8
       5    4413.1     76.0      8    4413.1     76.0      8
       6   93370.6  42900.6      8   93370.6  42900.6      8
       7  199634.4  43273.1      8  199634.4  43273.1      8
       8  262469.6   5896.3      8  262469.6   5896.3      8

 The file .skampi I am using is the standard one that came with version
4.1, with only one notable change:
@STANDARDERRORDEFAULT 100.00

 Thanks!
 Kostya



--- "Galen M. Shipman" <gship...@lanl.gov> wrote:

> Hi Konstantin,
> 
> > MPI_Alltoall_Isend_Irecv
> 
> This is a very unscalable algorithm in skampi as it simply posts N  
> MPI_Irecv's and MPI_Isend's and then does a Waitall. We shouldn't  
> have an issue though on 8 procs but in general I would expect the  
> performance of this  algorithm to degrade quite quickly especially  
> compared to Open MPI's tuned collectives. I can dig into this a bit  
> more if you send me your .skampi file configured to run this  
> particular benchmark.
> 
> Thanks,
> 
> Galen
> 
> 
> On Feb 4, 2006, at 9:37 AM, Konstantin Kudin wrote:
> 
> >  Dear Jeff and Galen,
> >
> >  I have tried openmpi-1.1a1r8890. The good news is that it seems
> like
> > the freaky long latencies for certain packet sizes went away with
> the
> > options they showed up with before. Also, one version of all-to-all
> > appears to behave nicer with a specified set of parameters.
> However, I
> > still get only 1cpu performance out of 8 with the actual
> application,
> > and all this time is spent doing parallel FFTs. What is interesting
> is
> > that even with the tuned parameters, the other version of
> all-to-all
> > still performs quite poorly (see below).
> >
> > #/*@insyncol_MPI_Alltoall-nodes-long-SM.ski*/
> > mpirun -np 8 -mca btl tcp -mca coll self,basic,tuned -mca  \
> > mpi_paffinity_alone 1 skampi41
> >        2     272.1      3.7      8     272.1      3.7      8
> >        3    1800.5     72.9      8    1800.5     72.9      8
> >        4    3074.0     61.0      8    3074.0     61.0      8
> >        5    5705.5    102.0      8    5705.5    102.0      8
> >        6    8054.2    282.3      8    8054.2    282.3      8
> >        7    9462.9    104.2      8    9462.9    104.2      8
> >        8   11245.8     66.9      8   11245.8     66.9      8
> >
> > mpirun -np 8 -mca btl tcp -mca coll self,basic,tuned -mca  \
> > mpi_paffinity_alone 1  -mca coll_basic_crossover 8 skampi41
> >        2     267.7      1.5      8     267.7      1.5      8
> >        3    1591.2      8.4      8    1591.2      8.4      8
> >        4    2704.4     17.1      8    2704.4     17.1      8
> >        5    4813.7    307.9      3    4813.7    307.9      3
> >        6    5329.1     57.0      2    5329.1     57.0      2
> >        7  198767.6  49076.2      5  198767.6  49076.2      5
> >        8  254832.6  11235.3      5  254832.6  11235.3      5
> >
> >
> >  Still poor performance:
> >
> > #/*@insyncol_MPI_Alltoall_Isend_Irecv-nodes-long-SM.ski*/
> >        2     235.0      0.7      8     235.0      0.7      8
> >        3    1565.6     15.3      8    1565.6     15.3      8
> >        4    2694.8     24.3      8    2694.8     24.3      8
> >        5   11389.9   6971.9      6   11389.9   6971.9      6
> >        6  249612.0  21102.1      2  249612.0  21102.1      2
> >        7  239051.9   3915.0      2  239051.9   3915.0      2
> >        8  262356.5  12324.6      2  262356.5  12324.6      2
> >
> >
> >  Kostya
> >
> >
> >
> >
> > --- Jeff Squyres <jsquy...@open-mpi.org> wrote:
> >
> >> Greetings Konstantin.
> >>
> >> Many thanks for this report.  Another user submitted almost the
> same
> >>
> >> issue earlier today (poor performance of Open MPI 1.0.x
> collectives;
> >>
> >> see
> http://www.open-mpi.org/community/lists/users/2006/02/0558.php).
> >>
> >> Let me provide an additional clarification on Galen's reply:
> >>
> >> The collectives in Open MPI 1.0.x are known to be sub-optimal --
> they
> >>
> >> return correct results, but they are not optimized at all.  This
> is
> >> what Galen meant by "If I use the basic collectives then things do
> >> fall apart with long messages, but this is expected".  The
> >> collectives in the Open MPI 1.1.x series (i.e., our current
> >> development trunk) provide *much* better performance.
> >>
> >> Galen ran his tests using the "tuned" collective module in the
> 1.1.x
> >>
> >> series -- these are the "better" collectives that I referred to
> >> above.  This "tuned" module does not exist in the 1.0.x series.
> >>
> >> You can download a 1.1.x nightly snapshot -- including the new
> >> "tuned" module -- from here:
> >>
> >>    http://www.open-mpi.org/nightly/trunk/
> >>
> >> If you get the opportunity, could you re-try your application with
> a
> >>
> >> 1.1 snapshot?
> >
> >
> > __________________________________________________
> > Do You Yahoo!?
> > Tired of spam?  Yahoo! Mail has the best spam protection around
> > http://mail.yahoo.com
> > _______________________________________________
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 





__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

Reply via email to