Sorry typo. I have dual X5660 not X5560. http://ark.intel.com/products/47921/Intel-Xeon-Processor-X5660-12M-Cache-2_80-GHz-6_40-GTs-Intel-QPI?q=x5660
On 29 January 2014 21:02, Reuti <re...@staff.uni-marburg.de> wrote: > Quoting Victor <victor.ma...@gmail.com>: > > Thanks for the reply Reuti, >> >> There are two machines: Node1 with 12 physical cores (dual 6 core Xeon) >> and >> > > Do you have this CPU? > > http://ark.intel.com/de/products/37109/Intel-Xeon- > Processor-X5560-8M-Cache-2_80-GHz-6_40-GTs-Intel-QPI > > -- Reuti > > > > Node2 with 4 physical cores (i5-2400). >> >> Regarding scaling on the single 12 core node, not it is also not linear. >> In >> fact it is downright strange. I do not remember the numbers right now but >> 10 jobs are faster than 11 and 12 are the fastest with peak performance of >> approximately 66 Msu/s which is also far from triple the 4 core >> performance. This odd non-linear behaviour also happens at the lower job >> counts on that 12 core node. I understand the decrease in scaling with >> increase in core count on the single node as the memory bandwidth is an >> issue. >> >> On the 4 core machine the scaling is progressive, ie. every additional job >> brings an increase in performance. Single core delivers 8.1 Msu/s while 4 >> cores deliver 30.8 Msu/s. This is almost linear. >> >> Since my original email I have also installed Open-MX and recompiled >> OpenMPI to use it. This has resulted in approximately 10% better >> performance using the existing GbE hardware. >> >> >> On 29 January 2014 19:40, Reuti <re...@staff.uni-marburg.de> wrote: >> >> Am 29.01.2014 um 03:00 schrieb Victor: >>> >>> > I am running a CFD simulation benchmark cavity3d available within >>> http://www.palabos.org/images/palabos_releases/palabos-v1.4r1.tgz >>> > >>> > It is a parallel friendly Lattice Botlzmann solver library. >>> > >>> > Palabos provides benchmark results for the cavity3d on several >>> different >>> platforms and variables here: >>> http://wiki.palabos.org/plb_wiki:benchmark:cavity_n400 >>> > >>> > The problem that I have is that the benchmark performance on my cluster >>> does not scale even close to a linear scale. >>> > >>> > My cluster configuration: >>> > >>> > Node1: Dual Xeon 5560 48 Gb RAM >>> > Node2: i5-2400 24 Gb RAM >>> > >>> > Gigabit ethernet connection on eth0 >>> > >>> > OpenMPI 1.6.5 on Ubuntu 12.04.3 >>> > >>> > >>> > Hostfile: >>> > >>> > Node1 -slots=4 -max-slots=4 >>> > Node2 -slots=4 -max-slots=4 >>> > >>> > MPI command: mpirun --mca btl_tcp_if_include eth0 --hostfile >>> /home/mpiuser/.mpi_hostfile -np 8 ./cavity3d 400 >>> > >>> > Problem: >>> > >>> > cavity3d 400 >>> > >>> > When I run mpirun -np 4 on Node1 I get 35.7615 Mega site updates per >>> second >>> > When I run mpirun -np 4 on Node2 I get 30.7972 Mega site updates per >>> second >>> > When I run mpirun --mca btl_tcp_if_include eth0 --hostfile >>> /home/mpiuser/.mpi_hostfile -np 8 ./cavity3d 400 I get 47.3538 Mega site >>> updates per second >>> > >>> > I understand that there are latencies with GbE and that there is MPI >>> overhead, but this performance scaling still seems very poor. Are my >>> expectations of scaling naive, or is there actually something wrong and >>> fixable that will improve the scaling? Optimistically I would like each >>> node to add to the cluster performance, not slow it down. >>> > >>> > Things get even worse if I run asymmetric number of mpi jobs in each >>> node. For instance running -np 12 on Node1 >>> >>> Isn't this overloading the machine with only 8 real cores in total? >>> >>> >>> > is significantly faster than running -np 16 across Node1 and Node2, >>> thus >>> adding Node2 actually slows down the performance. >>> >>> The i5-2400 has only 4 cores and no threads. >>> >>> It depends on the algorithm how much data has to be exchanged between the >>> processes, and this can indeed be worse when used across a network. >>> >>> Also: is the algorithm scaling linear when used on node1 only with 8 >>> cores? When it's "35.7615 " with 4 cores, what result do you get with 8 >>> cores on this machine. >>> >>> -- Reuti >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >>> > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >