FWIW, we have made some improvements to shared memory performance in
the upcoming v1.3 series. I won't ask you to test a v1.3 tarball
right now because there's a gnarly bug in the shared memory support
that George is working to fix -- hopefully he'll fix it soon and you
can see if the performance is a bit better in v1.3.
On Aug 13, 2008, at 3:52 AM, Lenny Verkhovsky wrote:
Hi,
just for the try - can run np 2
( Ping Pong test is for 2 processes only )
On 8/13/08, Daniël Mantione <daniel.manti...@clustervision.com> wrote:
On Tue, 12 Aug 2008, Gus Correa wrote:
> Hello Daniel and list
>
> Could it be a problem with memory bandwidth / contention in multi-
core?
Yes, I believe we are somehow limited by memory performance. Here are
some numbers from a dual Opteron 2352 system, which has much more
memory
bandwidth:
#---------------------------------------------------
# Benchmarking PingPong
# #processes = 2
# ( 6 additional processes waiting in MPI_Barrier)
#---------------------------------------------------
#bytes #repetitions t[usec] Mbytes/sec
0 1000 0.86 0.00
1 1000 0.97 0.98
2 1000 0.95 2.01
4 1000 0.96 3.97
8 1000 0.95 7.99
16 1000 0.96 15.85
32 1000 0.99 30.69
64 1000 0.97 63.09
128 1000 1.02 119.68
256 1000 1.18 207.25
512 1000 1.40 348.77
1024 1000 1.75 556.75
2048 1000 2.59 753.22
4096 1000 5.10 766.23
8192 1000 7.93 985.13
16384 1000 14.60 1070.57
32768 1000 27.92 1119.23
65536 640 46.67 1339.16
131072 320 86.03 1453.06
262144 160 163.16 1532.21
524288 80 310.01 1612.88
1048576 40 730.62 1368.69
2097152 20 1449.72 1379.57
4194304 10 2884.90 1386.53
However, +/- 1200 MB/s (or +/ 1500 MB/s in case of the AMD system)
is not
even close to the memory performance limits the systems, so there
should be room for optimization.
After all, the openib btl manages to tranfer the data from the
memory of
oneprocess to the memory of another process just fine with more
performance.
> It has been reported in many mailing lists (mpich, beowulf, etc).
> Here it seems to happen in dual-processor dual-core with our
memory intensive
> programs.
MPICH2 manages to get about 5GB/s in shared memory performance on the
Xeon 5420 system.
> Have you checked what happens to the shared memory runs as you
> you increase the number of active cores/processes?
> Would it help to set the processor affinity in the shared memory
runs?
>
> http://www.open-mpi.org/faq/?category=building#build-paffinity
> http://www.open-mpi.org/faq/?category=tuning#using-paffinity
Neither has any effect on the scores.
Daniël
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
--
Jeff Squyres
Cisco Systems