> Greg Schafer wrote: >> A lot of seasoned SMP-building folks work on the basis of make -j X+1 ie: make -j3 if you have 2 cpus or 2 cores. As a person who has been building in parallel for a long time, I strongly disagree with a comment elsewhere in this thread about performance plummeting if overutilizing. > > (Note this is all theoretical: I haven't really tested it. I've built a few packages with -j2 on my dual-core machine, but I never looked into the optimal -j setting much. So if you use X+1 and it works well, that probably trumps my guesses.)
I recently got access to a dual-core laptop (1.6GHz core 2 duo, 512MB RAM). I measured the machine's SBU using -j1, -j2, -j3, and -j4. The SBU includes unpacking binutils, building, installing, and removing the sources. Configure and Make install are not parallelized. -j1: real 3m20.447s user 2m18.153s sys 0m36.218s -j2: real 1m50.967s user 2m8.160s sys 0m32.510 -j3: real 1m42.912s user 2m7.948s sys 0m33.666s -j4: real 1m46.869s user 2m8.840s sys 0m33.970s -j5 returns almost the same results as -j4. I wonder how many jobs Make actually creates when compiling binutils with make -j... Make -j (with no arguments) creates as many jobs as possible, and the results are interesting: real 2m28.837s user 2m14.148s sys 0m37.246s > If you have one CPU, and make runs two jobs, *and* both jobs are CPU-bound, then performance will probably only be slightly worse than running one job. The overhead of switching between the two tasks will take some time, but not very much. I agree with all you say, but I believe you're missing something more important than context switches: cache and bus conflicts. These are the main reason performance suffers when a core is running more than one active thread. The dual-core CPUs are very sensitive to having their data and instructions ready the instant they need it. Also, each application requires special tuning to really, really get every drop of performance out of multicore CPUs. Valgrind or Intel's VTune (an amazing tool) really help here. Make's -j is really a blunt hammer, it just throws tasks to the cores without any special consideration for how it will impact performance. > And here's my guess for why X+1 works well: most compiles don't seem to be entirely CPU-bound. When compiling packages, I can see my (per-core) CPU usage, and it's not usually 100%. I suspect the cost of going after the disk to load the source files in and write the object files out (not to mention the temp files if you don't use -pipe) is much greater than the cost of parsing and optimizing a small C file. And lots of packages (maybe most?) seem to be made up of many small C files. I'd say the same thing. -- Miguel Bazdresch -- http://linuxfromscratch.org/mailman/listinfo/lfs-dev FAQ: http://www.linuxfromscratch.org/faq/ Unsubscribe: See the above information page