Patrick Lauer posted <[EMAIL PROTECTED]>, excerpted below, on Thu, 15 Dec 2005 13:48:05 +0100:
> I was wondering if there are any sane ways to optimize the performance > of a Gentoo system. This really belongs on user, or perhaps on the appropriate purposed list, desktop or hardened or whatever, not on devel. That said, some comments... (I can't resist. <g>) > Overoptimization (the well known "-O9 -fomgomg" CFLAGS etc.) tends to > make things unstable, which is of course not what we want. The "easy" > way out would be buying faster hardware, but that is usually not an > option ;-) > > So ... what can be done to get the stable maximum out of your hardware? > > In my experience (x86 centric - do other arches have different > "problems"?) the following is stable, but not necessarily the optimum: The general rules are the same, but there are architectural differences that often change the details. I /think/ it was MIPS that has extremely slow i/o (I saw that mentioned in the split-kde-ebuilds debate, they said it could cause compile times to double -- a big thing for something as big as KDE). x86 (32-bit) has a relatively small number of CPU registers, compared to most other archs (amd64 in 64-bit mode increased the number dramatically, tho it's the same for 32-bit mode for compatibility reasons), and this has a big effect on register use strategy. That said, in the general case, the -march switch normally chooses pretty good defaults for the target arch. Modifying them a whole lot from that, other than to cover special cases, or with the general -Ox optimization switches, is therefore often counterproductive and/or problematic. > - don't overtweak CFLAGS. "-O2 -march=$your_cpu_family" seems to be on > average the best, -O3 is often slower and can cause bugs A lot of folks don't realize the effect of cache memory on optimizations. I'll be brief here, but particularly for things like the kernel that stay in memory, -Os can at times work wonders, because it means more of the working set stays in a cache closer to the CPU, and the additional speed in retrieving that code far outweighs the compromises made to optimizations to shrink it to size. Conversely, media streaming or encoding apps are constantly throwing out old data and fetching new data, and the optimizations are often more effective for them, so they work better with -O2 or even -O3. There have been occasional problems with -Os, generally because it isn't used as much and gets less testing, so earlier in a gcc cycle series. However, I run -Os here (amd64) by default, and haven't seen any issues that went away if I reverted to -O2, over the couple years I've been running Gentoo. (Actually, that has been the case, even when I've edited ebuilds to remove their stripflags calls and the like. Glibc and xorg both stripflags including -Os. xorg seemed to benefit here from -Os after I removed the stripflags call, while glibc worked but seemed slower. Note that editing ebuilds means if it breaks, you get to keep the pieces!) For gcc, -pipe doesn't improve program optimization, but will make compiling faster. -fomit-frame-pointers makes smaller applications if you aren't debugging. Those are both common enough to be fairly safe. -frename-registers and -fweb may also be useful. (-fweb ceases to be so on gcc4, however, because it is implemented differently.) -funit-at-a-time (new to gcc-3.4, so don't try it with gcc-3.3) may also be worth looking into, altho it's already enabled by -Os. These latter flags are less commonly used, however, thus less well tested, and may therefore cause very occasional problems. (-funit-at-a-time was known to do so early in the 3.4 cycle, but those issues should have been long ago dealt with by now.) I consider those /reasonably/ conservative, and it's what I run. If I were running a server, however, I'd probably only run -O2 and the first two (-pipe and -fomit-frame-pointers). Do some research on -Os, in any case. It could be well worth your time. > - check that all IDE disks use DMA mode, otherwise they are limited to > ~16M/s with a huge CPU usage penalty. Sometimes (application-specific) > increasing the readahead with hdparm gives a huge throughput boost. This suggestion does involve hardware, but not a real heavy cost, and the performance boost may be worth it. Consider running a RAID system. I recently switched to RAID, a four-disk setup, raid1/mirrored for /boot, raid6 (for redundancy) for most of the system, raid0/striped (for speed) for /tmp, the portage dir, etc, stuff that was either temporary anyway, or could easily be redownloaded. (Swap can also be striped, set equal partitions on each disk and set equal priority for them in fstab.) I was very pleasantly surprised at how much of a difference it made! Cost, as I said, is reasonable, particularly if you have disks laying around or can buy them used. Even buying say three 80-gig drives and doing what I did only with a raid5 is reasonable, at the price of hard drives these days. Unfortunately, if your board is still PATA, you can only run a single disk per IDE channel or it bogs down, so you may need to buy a PCI IDE expansion board which will add to the cost. If you have onboard SATA and are buying new disks so can buy SATA anyway (my case), that should do just fine, as SATA runs a dedicated channel to each drive anyway. SCSI is a higher cost option, ruled out here, but SATA works very nicely, certainly so for me. > - kernel tweaks like setting swappiness or using a different I/O > scheduler (CFQ, deadline) should help, but I'm not aware of any "real" > benchmarks Again, a reasonable new-hardware suggestion. When purchasing a new system or considering an upgrade, more memory is often the most effective optimization you can make (with the raid suggestion above very close to it). Slower CPU and more memory, up to a gig or so, is almost always better than the reverse, because hard drive access is WAYYY slower than even cheap/slow memory. At a gig of memory, running with swap disabled is actually a practical option, altho it might not be faster and there are a certain memory zone management considerations. Usual X/KDE desktop usage will run perhaps a third of a gig. That means half to 2/3 gig for cache, which is "comfortable". Naturally, if you take the RAID suggestion above, this one isn't quite as critical, because drive latency will be lower so reliance on swap isn't as painful, and a big cache not nearly as critical to good performance. A gig to two gig can still be useful, but the cost/performance tradeoff isn't as good, and the money will likely be better spent elsewhere. Note that with a gig of memory and a striped swap, I have swappiness upped to 100 to force the most unused app memory to swap, and I literally can't tell when it starts swapping at all, except by watching the used swap graph on ksysguard. None at all of the slowdowns I had previously associated with swapping, back when I had a single drive and a half-gig of memory. > - using a "smarter" filesystem can dramatically improve performance at > the potential cost of reliability. As data on FS reliability is hard to > find from unbiased sources this becomes a religious issue ... migrating > from ext3 to reiserfs makes "emerge sync" extremely much faster, but is > reiserfs sustainable? I run reiserfs here on everything. However, some don't consider it extremely stable. I keep second-copy partitions as backups of stuff I want to ensure is safe, for that reason and others (fat-finger deleting, anyone?). Bottom line, reiserfs is certainly safe "enough", if you have a decent backup system in place, and you follow it regularly, as you should. I can't see how anyone can reasonably disagree with that, filesystem religious zealousy or not. In any case, note that you can simply redownload your portage tree anyway, and with the speed and size benefits of reiserfs (size only if you don't have notail in your config), even the ones least likely to trust the integrity of reiserfs should see the benefit of putting your portage tree on it. /tmp and/or /var/tmp may equally benefit, for the same reasons. An exception might be if you regularly put huge files (700 meg CD and multi-gig DVD images to burn, would be one example) on the partition. In that case, jfs or xfs (don't remember which, but one's optimized for large files) might be preferable. As I said, I run reiserfs for everything here, but I also have backup images of stuff I know I want to keep. > Are there any application-specific tweaks As I mentioned, -O3 is often best for multimedia stuff, encoders/decoders/streamers and the like, while -O2, or often, -Os, is better for most things. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman in http://www.linuxdevcenter.com/pub/a/linux/2004/12/22/rms_interview.html -- gentoo-dev@gentoo.org mailing list