Thanks for the insights Tim. I was aware that the CPUs will choke beyond a certain point. From memory on my machine this happens with 5 concurrent MPI jobs with that benchmark that I am using.
My primary question was about scaling between the nodes. I was not getting close to double the performance when running MPI jobs acros two 4 core nodes. It may be better now since I have Open-MX in place, but I have not repeated the benchmarks yet since I need to get one simulation job done asap. Regarding your mention of setting affinities and MPI ranks do you have a specific (as in syntactically specific since I am a novice and easily confused...) examples how I may want to set affinities to get the Westmere node performing better? ompi_info returns this: MCA paffinity: hwloc (MCA v2.0, API v2.0, Component v1.6.5) And finally to hybridisation... in a week or so I will get 4 AMD A10-6800 machines with 8Gb each on loan and will attempt to make them work along the existing Intel nodes. Victor On 29 January 2014 22:03, Tim Prince <n...@aol.com> wrote: > > On 1/29/2014 8:02 AM, Reuti wrote: > >> Quoting Victor <victor.ma...@gmail.com>: >> >> Thanks for the reply Reuti, >>> >>> There are two machines: Node1 with 12 physical cores (dual 6 core Xeon) >>> and >>> >> >> Do you have this CPU? >> >> http://ark.intel.com/de/products/37109/Intel-Xeon- >> Processor-X5560-8M-Cache-2_80-GHz-6_40-GTs-Intel-QPI >> >> -- Reuti >> >> It's expected on the Xeon Westmere 6-core CPUs to see MPI performance > saturating when all 4 of the internal buss paths are in use. For this > reason, hybrid MPI/OpenMP with 2 cores per MPI rank, with affinity set so > that each MPI rank has its own internal CPU buss, could out-perform plain > MPI on those CPUs. > That scheme of pairing cores on selected internal buss paths hasn't been > repeated. Some influential customers learned to prefer the 4-core version > of that CPU, given a reluctance to adopt MPI/OpenMP hybrid with affinity. > If you want to talk about "downright strange," start thinking about the > schemes to optimize performance of 8 threads with 2 threads assigned to > each internal CPU buss on that CPU model. Or your scheme of trying to > balance MPI performance between very different CPU models. > Tim > > >> Node2 with 4 physical cores (i5-2400). >>> >>> Regarding scaling on the single 12 core node, not it is also not linear. >>> In >>> fact it is downright strange. I do not remember the numbers right now but >>> 10 jobs are faster than 11 and 12 are the fastest with peak performance >>> of >>> approximately 66 Msu/s which is also far from triple the 4 core >>> performance. This odd non-linear behaviour also happens at the lower job >>> counts on that 12 core node. I understand the decrease in scaling with >>> increase in core count on the single node as the memory bandwidth is an >>> issue. >>> >>> On the 4 core machine the scaling is progressive, ie. every additional >>> job >>> brings an increase in performance. Single core delivers 8.1 Msu/s while 4 >>> cores deliver 30.8 Msu/s. This is almost linear. >>> >>> Since my original email I have also installed Open-MX and recompiled >>> OpenMPI to use it. This has resulted in approximately 10% better >>> performance using the existing GbE hardware. >>> >>> >>> On 29 January 2014 19:40, Reuti <re...@staff.uni-marburg.de> wrote: >>> >>> Am 29.01.2014 um 03:00 schrieb Victor: >>>> >>>> > I am running a CFD simulation benchmark cavity3d available within >>>> http://www.palabos.org/images/palabos_releases/palabos-v1.4r1.tgz >>>> > >>>> > It is a parallel friendly Lattice Botlzmann solver library. >>>> > >>>> > Palabos provides benchmark results for the cavity3d on several >>>> different >>>> platforms and variables here: >>>> http://wiki.palabos.org/plb_wiki:benchmark:cavity_n400 >>>> > >>>> > The problem that I have is that the benchmark performance on my >>>> cluster >>>> does not scale even close to a linear scale. >>>> > >>>> > My cluster configuration: >>>> > >>>> > Node1: Dual Xeon 5560 48 Gb RAM >>>> > Node2: i5-2400 24 Gb RAM >>>> > >>>> > Gigabit ethernet connection on eth0 >>>> > >>>> > OpenMPI 1.6.5 on Ubuntu 12.04.3 >>>> > >>>> > >>>> > Hostfile: >>>> > >>>> > Node1 -slots=4 -max-slots=4 >>>> > Node2 -slots=4 -max-slots=4 >>>> > >>>> > MPI command: mpirun --mca btl_tcp_if_include eth0 --hostfile >>>> /home/mpiuser/.mpi_hostfile -np 8 ./cavity3d 400 >>>> > >>>> > Problem: >>>> > >>>> > cavity3d 400 >>>> > >>>> > When I run mpirun -np 4 on Node1 I get 35.7615 Mega site updates per >>>> second >>>> > When I run mpirun -np 4 on Node2 I get 30.7972 Mega site updates per >>>> second >>>> > When I run mpirun --mca btl_tcp_if_include eth0 --hostfile >>>> /home/mpiuser/.mpi_hostfile -np 8 ./cavity3d 400 I get 47.3538 Mega site >>>> updates per second >>>> > >>>> > I understand that there are latencies with GbE and that there is MPI >>>> overhead, but this performance scaling still seems very poor. Are my >>>> expectations of scaling naive, or is there actually something wrong and >>>> fixable that will improve the scaling? Optimistically I would like each >>>> node to add to the cluster performance, not slow it down. >>>> > >>>> > Things get even worse if I run asymmetric number of mpi jobs in each >>>> node. For instance running -np 12 on Node1 >>>> >>>> Isn't this overloading the machine with only 8 real cores in total? >>>> >>>> >>>> > is significantly faster than running -np 16 across Node1 and Node2, >>>> thus >>>> adding Node2 actually slows down the performance. >>>> >>>> The i5-2400 has only 4 cores and no threads. >>>> >>>> It depends on the algorithm how much data has to be exchanged between >>>> the >>>> processes, and this can indeed be worse when used across a network. >>>> >>>> Also: is the algorithm scaling linear when used on node1 only with 8 >>>> cores? When it's "35.7615 " with 4 cores, what result do you get with 8 >>>> cores on this machine. >>>> >>>> -- Reuti >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> >>>> >> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> > > -- > Tim Prince > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >