On Thu, 2005-12-15 at 13:48 +0100, Patrick Lauer wrote: > - don't overtweak CFLAGS. "-O2 -march=$your_cpu_family" seems to be on > average the best, -O3 is often slower and can cause bugs
-O2 -march=$your_cpu_family -pipe -fomit-frame-pointer -pipe Use pipes rather than temporary files for communication between the various stages of compilation. This fails to work on some systems where the assembler is unable to read from a pipe; but the GNU assembler has no trouble. -O also turns on -fomit-frame-pointer on machines where doing so does not interfere with debugging. (However, x86 is not one of these machines, so you can turn it on if you are not a developer doing debugging for a slight additional speed increase) -fomit-frame-pointer Don't keep the frame pointer in a register for functions that don't need one. This avoids the instructions to save, set up and restore frame pointers; it also makes an extra register available in many functions. > - don't do anything with ASFLAGS, LDFLAGS. This causes weird random > breakage (e.g. LDFLAGS="-O1" causes prelink to fail with "absurd" > errors) and doesn't give a noticeable performance boost Correct. Also, running prelink can improve speed at the cost of disk space. > - check that all IDE disks use DMA mode, otherwise they are limited to > ~16M/s with a huge CPU usage penalty. Sometimes (application-specific) > increasing the readahead with hdparm gives a huge throughput boost. I typically use the same hdparm settings as listed in the Handbook: disc0_args="-d1 -A1 -m16 -u1 -a64 -c1" cdrom0_args="-d1 -c1" > - kernel tweaks like preempt may increase the responsiveness of the > system, but often reduce throughput and may have unexpected sideeffects > like random audio stutter as well as random kernel crashes ;-) This is especially true on non-x86 architectures. > - kernel tweaks like setting swappiness or using a different I/O > scheduler (CFQ, deadline) should help, but I'm not aware of any "real" > benchmarks except microbenchmarks (can create 1M files 10% faster!!!!! - > yes, but how does it behave with a normal workload?) CFQ is much worse for a desktop system. I tend to like deadline for playing games. These can probably make a bit more difference than a new -fomg-itsofast-and-broken-math added to CFLAGS. > - using a "smarter" filesystem can dramatically improve performance at > the potential cost of reliability. As data on FS reliability is hard to > find from unbiased sources this becomes a religious issue ... migrating > from ext3 to reiserfs makes "emerge sync" extremely much faster, but is > reiserfs sustainable? Well, reiserfs 3 isn't so bad on architectures where it doesn't vomit all over itself immediately. Also, resierfs loses much of its luster if you're running ext3 with dir_index. There was a tip in the GWN about turning on dir_index on an already formatted file system. If formatting a new one, just use mkfs.ext2 -J -O dir_index /dev/$whatever to create your file system. > Are there any application-specific tweaks (e.g. "use the prefork MPM > with apache2")? What is known to break things, what has usually > beneficial behaviour? Are there any useful benchmarks that show the > performance difference between different settings? Well, turning on SBA and Fast Writes on Nvidia always helps. As for benchmarks, I think the issue is it depends entirely on usage. Having something that is 30% faster on paper isn't very useful if you never do it the way the benchmark does. I wish I had more numbers/examples here, but there isn't really much in the way of decent benchmarks published and readily available. Hopefully some other people will know of more of them than I do. -- Chris Gianelloni Release Engineering - Strategic Lead x86 Architecture Team Games - Developer Gentoo Linux
signature.asc
Description: This is a digitally signed message part