Hi Konstantin,
MPI_Alltoall_Isend_Irecv
This is a very unscalable algorithm in skampi as it simply posts N
MPI_Irecv's and MPI_Isend's and then does a Waitall. We shouldn't
have an issue though on 8 procs but in general I would expect the
performance of this algorithm to degrade quite quickly especially
compared to Open MPI's tuned collectives. I can dig into this a bit
more if you send me your .skampi file configured to run this
particular benchmark.
Thanks,
Galen
On Feb 4, 2006, at 9:37 AM, Konstantin Kudin wrote:
Dear Jeff and Galen,
I have tried openmpi-1.1a1r8890. The good news is that it seems like
the freaky long latencies for certain packet sizes went away with the
options they showed up with before. Also, one version of all-to-all
appears to behave nicer with a specified set of parameters. However, I
still get only 1cpu performance out of 8 with the actual application,
and all this time is spent doing parallel FFTs. What is interesting is
that even with the tuned parameters, the other version of all-to-all
still performs quite poorly (see below).
#/*@insyncol_MPI_Alltoall-nodes-long-SM.ski*/
mpirun -np 8 -mca btl tcp -mca coll self,basic,tuned -mca \
mpi_paffinity_alone 1 skampi41
2 272.1 3.7 8 272.1 3.7 8
3 1800.5 72.9 8 1800.5 72.9 8
4 3074.0 61.0 8 3074.0 61.0 8
5 5705.5 102.0 8 5705.5 102.0 8
6 8054.2 282.3 8 8054.2 282.3 8
7 9462.9 104.2 8 9462.9 104.2 8
8 11245.8 66.9 8 11245.8 66.9 8
mpirun -np 8 -mca btl tcp -mca coll self,basic,tuned -mca \
mpi_paffinity_alone 1 -mca coll_basic_crossover 8 skampi41
2 267.7 1.5 8 267.7 1.5 8
3 1591.2 8.4 8 1591.2 8.4 8
4 2704.4 17.1 8 2704.4 17.1 8
5 4813.7 307.9 3 4813.7 307.9 3
6 5329.1 57.0 2 5329.1 57.0 2
7 198767.6 49076.2 5 198767.6 49076.2 5
8 254832.6 11235.3 5 254832.6 11235.3 5
Still poor performance:
#/*@insyncol_MPI_Alltoall_Isend_Irecv-nodes-long-SM.ski*/
2 235.0 0.7 8 235.0 0.7 8
3 1565.6 15.3 8 1565.6 15.3 8
4 2694.8 24.3 8 2694.8 24.3 8
5 11389.9 6971.9 6 11389.9 6971.9 6
6 249612.0 21102.1 2 249612.0 21102.1 2
7 239051.9 3915.0 2 239051.9 3915.0 2
8 262356.5 12324.6 2 262356.5 12324.6 2
Kostya
--- Jeff Squyres <jsquy...@open-mpi.org> wrote:
Greetings Konstantin.
Many thanks for this report. Another user submitted almost the same
issue earlier today (poor performance of Open MPI 1.0.x collectives;
see http://www.open-mpi.org/community/lists/users/2006/02/0558.php).
Let me provide an additional clarification on Galen's reply:
The collectives in Open MPI 1.0.x are known to be sub-optimal -- they
return correct results, but they are not optimized at all. This is
what Galen meant by "If I use the basic collectives then things do
fall apart with long messages, but this is expected". The
collectives in the Open MPI 1.1.x series (i.e., our current
development trunk) provide *much* better performance.
Galen ran his tests using the "tuned" collective module in the 1.1.x
series -- these are the "better" collectives that I referred to
above. This "tuned" module does not exist in the 1.0.x series.
You can download a 1.1.x nightly snapshot -- including the new
"tuned" module -- from here:
http://www.open-mpi.org/nightly/trunk/
If you get the opportunity, could you re-try your application with a
1.1 snapshot?
__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users