On Thu, 2005-12-15 at 13:48 +0100, Patrick Lauer wrote:
> - don't overtweak CFLAGS. "-O2 -march=$your_cpu_family" seems to be on
> average the best, -O3 is often slower and can cause bugs

-O2 -march=$your_cpu_family -pipe -fomit-frame-pointer

-pipe
        Use pipes rather than temporary files for communication between
        the various stages of compilation. This fails to work on some
        systems where the assembler is unable to read from a pipe; but
        the GNU assembler has no trouble.

-O also turns on -fomit-frame-pointer on machines where doing so does
not interfere with debugging.

(However, x86 is not one of these machines, so you can turn it on if you
are not a developer doing debugging for a slight additional speed
increase)

-fomit-frame-pointer
        Don't keep the frame pointer in a register for functions that
        don't need one. This avoids the instructions to save, set up and
        restore frame pointers; it also makes an extra register
        available in many functions.

> - don't do anything with ASFLAGS, LDFLAGS. This causes weird random
> breakage (e.g. LDFLAGS="-O1" causes prelink to fail with "absurd"
> errors) and doesn't give a noticeable performance boost

Correct.

Also, running prelink can improve speed at the cost of disk space.

> - check that all IDE disks use DMA mode, otherwise they are limited to
> ~16M/s with a huge CPU usage penalty. Sometimes (application-specific)
> increasing the readahead with hdparm gives a huge throughput boost.

I typically use the same hdparm settings as listed in the Handbook:

disc0_args="-d1 -A1 -m16 -u1 -a64 -c1"
cdrom0_args="-d1 -c1"

> - kernel tweaks like preempt may increase the responsiveness of the
> system, but often reduce throughput and may have unexpected sideeffects
> like random audio stutter as well as random kernel crashes ;-)

This is especially true on non-x86 architectures.

> - kernel tweaks like setting swappiness or using a different I/O
> scheduler (CFQ, deadline) should help, but I'm not aware of any "real"
> benchmarks except microbenchmarks (can create 1M files 10% faster!!!!! -
> yes, but how does it behave with a normal workload?)

CFQ is much worse for a desktop system.  I tend to like deadline for
playing games.  These can probably make a bit more difference than a new
-fomg-itsofast-and-broken-math added to CFLAGS.

> - using a "smarter" filesystem can dramatically improve performance at
> the potential cost of reliability. As data on FS reliability is hard to
> find from unbiased sources this becomes a religious issue ... migrating
> from ext3 to reiserfs makes "emerge sync" extremely much faster, but is
> reiserfs sustainable?

Well, reiserfs 3 isn't so bad on architectures where it doesn't vomit
all over itself immediately.  Also, resierfs loses much of its luster if
you're running ext3 with dir_index.  There was a tip in the GWN about
turning on dir_index on an already formatted file system.  If formatting
a new one, just use mkfs.ext2 -J -O dir_index /dev/$whatever to create
your file system.

> Are there any application-specific tweaks (e.g. "use the prefork MPM
> with apache2")? What is known to break things, what has usually
> beneficial behaviour? Are there any useful benchmarks that show the
> performance difference between different settings?

Well, turning on SBA and Fast Writes on Nvidia always helps.  As for
benchmarks, I think the issue is it depends entirely on usage.  Having
something that is 30% faster on paper isn't very useful if you never do
it the way the benchmark does.  I wish I had more numbers/examples here,
but there isn't really much in the way of decent benchmarks published
and readily available.  Hopefully some other people will know of more of
them than I do.

-- 
Chris Gianelloni
Release Engineering - Strategic Lead
x86 Architecture Team
Games - Developer
Gentoo Linux

Attachment: signature.asc
Description: This is a digitally signed message part

Reply via email to