How does that work?
The binaries have to get the all the machines of the clusters
somewhere.
Does this assume you are using NFS or similar for your build
directory?
Won't the overhead of using that instead of local disk kill most
of the
parallelization benefit of a cluster over a single SMP machine?
This will be true regardless of communication method. There is so
little
opportunity for parallelism that anything more than 4-8 local cores is
pretty much wasted. On a 4-core machine, more than 50% of the wall
time
is spent on things that will not use more than those 4 cores
regardless.
If the other 40-50% or so can be cut by a factor 4 compared to 4-core
execution, we still are talking about at most a 30% improvement on the
total wall time. Even a small serial overhead for communicating
sources
and binaries will still reduce this 30%.
We need to improve the Makefiles before it makes sense to use more
parallelism. Otherwise we'll just keep running into Amdahl's law.
Some numbers, 16-core 64-thread POWER7, c,c++,fortran bootstrap:
-j6:
real 57m32.245s
user 205m51.480s
sys 6m24.043s
-j10:
real 45m55.034s
user 211m37.833s
sys 6m33.305s
-j15:
real 41m51.061s
user 237m26.174s
sys 7m2.341s
-j60:
real 38m18.583s
user 336m12.393s
sys 11m26.717s
Segher