[gentoo-dev] Re: Optimizing performance

Duncan Thu, 15 Dec 2005 06:53:23 -0800

Patrick Lauer posted <[EMAIL PROTECTED]>, excerpted
below,  on Thu, 15 Dec 2005 13:48:05 +0100:


> I was wondering if there are any sane ways to optimize the performance
> of a Gentoo system.

This really belongs on user, or perhaps on the appropriate purposed list,
desktop or hardened or whatever, not on devel.  That said, some
comments...  (I can't resist. <g>)

> Overoptimization (the well known "-O9 -fomgomg" CFLAGS etc.) tends to
> make things unstable, which is of course not what we want. The "easy"
> way out would be buying faster hardware, but that is usually not an
> option ;-)
> 
> So ... what can be done to get the stable maximum out of your hardware?
> 
> In my experience (x86 centric - do other arches have different
> "problems"?) the following is stable, but not necessarily the optimum:

The general rules are the same, but there are architectural differences
that often change the details.  I /think/ it was MIPS that has extremely
slow i/o (I saw that mentioned in the split-kde-ebuilds debate, they said
it could cause compile times to double -- a big thing for something as big
as  KDE).  x86 (32-bit) has a relatively small number of CPU registers,
compared to most other archs (amd64 in 64-bit mode increased the number
dramatically, tho it's the same for 32-bit mode for compatibility
reasons), and this has a big effect on register use strategy.

That said, in the general case, the -march switch normally chooses pretty
good defaults for the target arch.  Modifying them a whole lot from that,
other than to cover special cases, or with the general -Ox optimization
switches, is therefore often counterproductive and/or problematic.

> - don't overtweak CFLAGS. "-O2 -march=$your_cpu_family" seems to be on
> average the best, -O3 is often slower and can cause bugs

A lot of folks don't realize the effect of cache memory on optimizations. 
I'll be brief here, but particularly for things like the kernel that stay
in memory, -Os can at times work wonders, because it means more of the
working set stays in a cache closer to the CPU, and the additional speed
in retrieving that code far outweighs the compromises made to
optimizations to shrink it to size.  Conversely, media streaming or
encoding apps are constantly throwing out old data and fetching new data,
and the optimizations are often more effective for them, so they work
better with -O2 or even -O3.

There have been occasional problems with -Os, generally because it isn't
used as much and gets less testing, so earlier in a gcc cycle series. 
However, I run -Os here (amd64) by default, and haven't seen any issues
that went away if I reverted to -O2, over the couple years I've been
running Gentoo.  (Actually, that has been the case, even when I've edited
ebuilds to remove their stripflags calls and the like.  Glibc and xorg
both stripflags including -Os.  xorg seemed to benefit here from -Os after
I removed the stripflags call, while glibc worked but seemed slower. Note
that editing ebuilds means if it breaks, you get to keep the pieces!)

For gcc, -pipe doesn't improve program optimization, but will make
compiling faster.  -fomit-frame-pointers makes smaller applications if
you aren't debugging.  Those are both common enough to be fairly safe. 
-frename-registers and -fweb may also be useful. (-fweb ceases to be so on
gcc4, however, because it is implemented differently.)  -funit-at-a-time
(new to gcc-3.4, so don't try it with gcc-3.3) may also be worth looking
into, altho it's already enabled by -Os. These latter flags are less
commonly used, however, thus less well tested, and may therefore cause
very occasional problems. (-funit-at-a-time was known to do so early in
the 3.4 cycle, but those issues should have been long ago dealt with by
now.)  I consider those /reasonably/ conservative, and it's what I run. 
If I were running a server, however, I'd probably only run -O2 and the
first two (-pipe and -fomit-frame-pointers).

Do some research on -Os, in any case.  It could be well worth your time.

> - check that all IDE disks use DMA mode, otherwise they are limited to
> ~16M/s with a huge CPU usage penalty. Sometimes (application-specific)
> increasing the readahead with hdparm gives a huge throughput boost.

This suggestion does involve hardware, but not a real heavy cost, and the
performance boost may be worth it. Consider running a RAID system.  I
recently switched to RAID, a four-disk setup, raid1/mirrored for /boot,
raid6 (for redundancy) for most of the system, raid0/striped (for speed)
for /tmp, the portage dir, etc, stuff that was either temporary anyway, or
could easily be redownloaded. (Swap can also be striped, set equal
partitions on each disk and set equal priority for them in fstab.) I was
very pleasantly surprised at how much of a difference it made!

Cost, as I said, is reasonable, particularly if you have disks laying
around or can buy them used.  Even buying say three 80-gig drives and
doing what I did only with a raid5 is reasonable, at the price of hard
drives these days.  Unfortunately, if your board is still PATA, you can
only run a single disk per IDE channel or it bogs down, so you may need to
buy a PCI IDE expansion board which will add to the cost.  If you have
onboard SATA and are buying new disks so can buy SATA anyway (my case),
that should do just fine, as SATA runs a dedicated channel to each
drive anyway.  SCSI is a higher cost option, ruled out here, but SATA
works very nicely, certainly so for me.

> - kernel tweaks like setting swappiness or using a different I/O
> scheduler (CFQ, deadline) should help, but I'm not aware of any "real"
> benchmarks

Again, a reasonable new-hardware suggestion.  When purchasing a new system
or considering an upgrade, more memory is often the most effective
optimization you can make (with the raid suggestion above very close to
it). Slower CPU and more memory, up to a gig or so, is almost always
better than the reverse, because hard drive access is WAYYY slower than
even cheap/slow memory.  At a gig of memory, running with swap disabled is
actually a practical option, altho it might not be faster and there are a
certain memory zone management considerations. Usual X/KDE desktop usage
will run perhaps a third of a gig.  That means half to 2/3 gig for cache,
which is "comfortable". Naturally, if you take the RAID suggestion above,
this one isn't quite as critical, because drive latency will be lower so
reliance on swap isn't as painful, and a big cache not nearly as critical
to good performance.  A gig to two gig can still be useful, but the
cost/performance tradeoff isn't as good, and the money will likely be
better spent elsewhere.

Note that with a gig of memory and a striped swap, I have swappiness upped
to 100 to force the most unused app memory to swap, and I literally can't
tell when it starts swapping at all, except by watching the used swap
graph on ksysguard.  None at all of the slowdowns I had previously
associated with swapping, back when I had a single drive and a half-gig of
memory.

> - using a "smarter" filesystem can dramatically improve performance at
> the potential cost of reliability. As data on FS reliability is hard to
> find from unbiased sources this becomes a religious issue ... migrating
> from ext3 to reiserfs makes "emerge sync" extremely much faster, but is
> reiserfs sustainable?

I run reiserfs here on everything.  However, some don't consider it
extremely stable.  I keep second-copy partitions as backups of stuff I
want to ensure is safe, for that reason and others (fat-finger deleting,
anyone?). Bottom line, reiserfs is certainly safe "enough", if you have a
decent backup system in place, and you follow it regularly, as you should.
I can't see how anyone can reasonably disagree with that, filesystem
religious zealousy or not.

In any case, note that you can simply redownload your portage tree anyway,
and with the speed and size benefits of reiserfs (size only if you don't
have notail in your config), even the ones least likely to trust the
integrity of reiserfs should see the benefit of putting your portage tree
on it.  /tmp and/or /var/tmp may equally benefit, for the same reasons. An
exception might be if you regularly put huge files (700 meg CD and
multi-gig DVD images to burn, would be one example) on the partition.  In
that case, jfs or xfs (don't remember which, but one's optimized for large
files) might be preferable.

As I said, I run reiserfs for everything here, but I also have backup
images of stuff I know I want to keep.

> Are there any application-specific tweaks

As I mentioned, -O3 is often best for multimedia stuff,
encoders/decoders/streamers and the like, while -O2, or often, -Os, is
better for most things.


-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman in
http://www.linuxdevcenter.com/pub/a/linux/2004/12/22/rms_interview.html


-- 
gentoo-dev@gentoo.org mailing list

[gentoo-dev] Re: Optimizing performance

Reply via email to