Re: gcc build / test times on multi-core hosts?

Joern RENNECKE Mon, 20 Feb 2006 11:30:53 -0800

DJ Delorie wrote:

There's not much difference between multi-core and multi-cpu, and I've
been building multi-cpu for years.

Some multi-core processors come with less L2 cache than their multi-CPU
counterparts.
Also, multi-cpu itself comes in different varieties, with Intels Xeons going
for the classic SMP design with a single shared memory bus providing
uniform memory access (UMA), but also a causing a bootleneck as you add
more processors; whereas the Opterons have non-uniform memory access
(NUMA), with each processor having separate memory busses.  So theat removes

the bottleneck of a shared memory bus, but the operating system mustallocate

most memory locally to each CPU to avoid a bottleneck in the cross-connect
of the processors.


The Athlon X2 and Opteron dual core processors use internally a cross-bar

switch to connect the two cores to the memory and inter-processorinterconnect,

so they have slightly higher memory latencies, but better communication

between the two cores inside the processor and the attached memory thanbetween

separate Opterons processors with their attached memory.

I wonder: what does the Linux (or insert your favourite BSD here) kernel do

with dual-core Opterons? Does it keep processor-affinity for physicalmemory

local to the two cores it is attached to?

Dual processor Dual-core Opterons seem like a very cost effective way toget a

4-way machine, and two cores each share two memory busses (if the memory is
appropriately installed), so memory bandwidth should also be  good.  But is

there a penalty to pay because this machine is not quite classical SMPnor quite

NUMA?
If the kernel pretends it's a fully NUMA machine, that would halve the

local memory available per CPU - i.e. builds that see largelyassymetric memoryusage could get slower, since when the kernel runs out of what it thinksis the

memory local to one core, it has a good chance (2/3 if you assume random
distribution) of grabbing memory that is indeed not local to that core.

If it pretends the machine is a classic SMP machine with uniform memoryaccess,it's even worse, since then irrespective of the size of the working set,it will tend

to grab memory from the wrong place half of the time.

At a previous job where we were very interested in build times, our
rule of thumb was N+1 jobs for N cpus with local disk, or 2N+1 jobs
for N cpus with nfs-mounted disk.  That was for build farms working on
a 12 hour C++ compile.

Was that UMA or NUMA, and how far up could you scale N usefully?
Do you know if software RAID (not so much R as A of ID) is effective at

avoiding I/O bottlenecks, e.g. will two disks for four cores work aswell as one

disk for two?

Re: gcc build / test times on multi-core hosts?

Reply via email to