Michael Meissner wrote:
On Fri, Jul 25, 2008 at 09:08:42AM +0200, Agner Fog wrote:
Gnu libc could borrow a lot of optimized functions from Opensolaris and
Mac and other open source projects. They look better than Gnu libc, but
there is still room for improvement. For example, Opensolaris does not
use XMM registers for strlen, although this is simpler than using
general purpose registers (see my code www.agner.org/optimize/asmlib.zip)
Note, glibc can only take code that is appropriately licensed and donated to
the FSF. In addition it must meet the coding standards for glibc.
The Mac/Xnu and Opensolaris projects have fairly liberal public
licenses. If there are legal differences, maybe the copyright owner is
open to negotiation. My own code has GPL license. The fact that I am
offering my code to you also means, of course, that I am willing to
grant the necessary license.
Also note, that it depends on the basic chip level what is fastest for the
operation (for example, using XMM registers are not faster for current AMD
platforms).
Indeed. That's why I am talking about CPU dispatching (i.e. different
branches for different CPUs). The CPU dispatching can be done with just
a single jump instruction:
At the function entry there is an indirect jump through a pointer to the
appropriate version. The code pointer initially points to a CPU
dispatcher. The CPU dispatcher detects which CPU it is running on, and
replaces the code pointer with a pointer to the appropriate version,
then jumps to the pointer. The next time the function is called, it
follows the pointer directly to the right version.
My memcpy runs faster with XMM registers than with 64-bit x64 registers
on AMD K8.
My strlen runs slower with XMM registers than with 64-bit x64 registers
on AMD K8.
I expect the XMM versions to run much faster on AMD K10, because it has
full 128-bit execution units and data paths, where K8 has only 64-bits.
I have not had the chance to test this on AMD K10 yet.
I believe it is best to optimize for the newest processors, because the
processor that is brand new today will become mainstream in a few years.
Memcpy/memset optimizations were added to glibc 2.8, though when your favorite
distribution will provide it is a different question:
http://sourceware.org/ml/libc-alpha/2008-04/msg00050.html
I have libc version 2.7. Can't find version 2.8.