Dear Galen, It actually turns out that there is a problem not only with MPI_Alltoall_Isend_Irecv, but also with another related operation insyncol_MPI_Alltoallv-nodes-long-SM.ski (this what seems to be holding down the FFTs, I checked the source code, and it uses alltoallv):
#/*@insyncol_MPI_Alltoallv-nodes-long-SM.ski*/ 2 250.8 1.0 8 250.8 1.0 8 3 1779.6 27.0 8 1779.6 27.0 8 4 2975.1 45.8 8 2975.1 45.8 8 5 4413.1 76.0 8 4413.1 76.0 8 6 93370.6 42900.6 8 93370.6 42900.6 8 7 199634.4 43273.1 8 199634.4 43273.1 8 8 262469.6 5896.3 8 262469.6 5896.3 8 The file .skampi I am using is the standard one that came with version 4.1, with only one notable change: @STANDARDERRORDEFAULT 100.00 Thanks! Kostya --- "Galen M. Shipman" <gship...@lanl.gov> wrote: > Hi Konstantin, > > > MPI_Alltoall_Isend_Irecv > > This is a very unscalable algorithm in skampi as it simply posts N > MPI_Irecv's and MPI_Isend's and then does a Waitall. We shouldn't > have an issue though on 8 procs but in general I would expect the > performance of this algorithm to degrade quite quickly especially > compared to Open MPI's tuned collectives. I can dig into this a bit > more if you send me your .skampi file configured to run this > particular benchmark. > > Thanks, > > Galen > > > On Feb 4, 2006, at 9:37 AM, Konstantin Kudin wrote: > > > Dear Jeff and Galen, > > > > I have tried openmpi-1.1a1r8890. The good news is that it seems > like > > the freaky long latencies for certain packet sizes went away with > the > > options they showed up with before. Also, one version of all-to-all > > appears to behave nicer with a specified set of parameters. > However, I > > still get only 1cpu performance out of 8 with the actual > application, > > and all this time is spent doing parallel FFTs. What is interesting > is > > that even with the tuned parameters, the other version of > all-to-all > > still performs quite poorly (see below). > > > > #/*@insyncol_MPI_Alltoall-nodes-long-SM.ski*/ > > mpirun -np 8 -mca btl tcp -mca coll self,basic,tuned -mca \ > > mpi_paffinity_alone 1 skampi41 > > 2 272.1 3.7 8 272.1 3.7 8 > > 3 1800.5 72.9 8 1800.5 72.9 8 > > 4 3074.0 61.0 8 3074.0 61.0 8 > > 5 5705.5 102.0 8 5705.5 102.0 8 > > 6 8054.2 282.3 8 8054.2 282.3 8 > > 7 9462.9 104.2 8 9462.9 104.2 8 > > 8 11245.8 66.9 8 11245.8 66.9 8 > > > > mpirun -np 8 -mca btl tcp -mca coll self,basic,tuned -mca \ > > mpi_paffinity_alone 1 -mca coll_basic_crossover 8 skampi41 > > 2 267.7 1.5 8 267.7 1.5 8 > > 3 1591.2 8.4 8 1591.2 8.4 8 > > 4 2704.4 17.1 8 2704.4 17.1 8 > > 5 4813.7 307.9 3 4813.7 307.9 3 > > 6 5329.1 57.0 2 5329.1 57.0 2 > > 7 198767.6 49076.2 5 198767.6 49076.2 5 > > 8 254832.6 11235.3 5 254832.6 11235.3 5 > > > > > > Still poor performance: > > > > #/*@insyncol_MPI_Alltoall_Isend_Irecv-nodes-long-SM.ski*/ > > 2 235.0 0.7 8 235.0 0.7 8 > > 3 1565.6 15.3 8 1565.6 15.3 8 > > 4 2694.8 24.3 8 2694.8 24.3 8 > > 5 11389.9 6971.9 6 11389.9 6971.9 6 > > 6 249612.0 21102.1 2 249612.0 21102.1 2 > > 7 239051.9 3915.0 2 239051.9 3915.0 2 > > 8 262356.5 12324.6 2 262356.5 12324.6 2 > > > > > > Kostya > > > > > > > > > > --- Jeff Squyres <jsquy...@open-mpi.org> wrote: > > > >> Greetings Konstantin. > >> > >> Many thanks for this report. Another user submitted almost the > same > >> > >> issue earlier today (poor performance of Open MPI 1.0.x > collectives; > >> > >> see > http://www.open-mpi.org/community/lists/users/2006/02/0558.php). > >> > >> Let me provide an additional clarification on Galen's reply: > >> > >> The collectives in Open MPI 1.0.x are known to be sub-optimal -- > they > >> > >> return correct results, but they are not optimized at all. This > is > >> what Galen meant by "If I use the basic collectives then things do > >> fall apart with long messages, but this is expected". The > >> collectives in the Open MPI 1.1.x series (i.e., our current > >> development trunk) provide *much* better performance. > >> > >> Galen ran his tests using the "tuned" collective module in the > 1.1.x > >> > >> series -- these are the "better" collectives that I referred to > >> above. This "tuned" module does not exist in the 1.0.x series. > >> > >> You can download a 1.1.x nightly snapshot -- including the new > >> "tuned" module -- from here: > >> > >> http://www.open-mpi.org/nightly/trunk/ > >> > >> If you get the opportunity, could you re-try your application with > a > >> > >> 1.1 snapshot? > > > > > > __________________________________________________ > > Do You Yahoo!? > > Tired of spam? Yahoo! Mail has the best spam protection around > > http://mail.yahoo.com > > _______________________________________________ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com