I've been working on a project involving fast boot speeds, which are timed from post-BIOS (this is an axiom of the project and not something I get to change). As such I've been working on optimising GRUB, and have been digging into it with this general debugging pattern (probably not optimal, I'm just showing it as an example of my approach in case anyone else is trying to do similar things):
#include <grub/time.h> #include <grub/env.h> void grub_function (void) { char *value, *valueend; value = grub_malloc (16384); *value = '\0'; valueend = value; grub_snprintf (valueend, 100, "%lld", grub_get_time_ms ()); valueend = grub_strchr (valueend, '\0'); /* do slow stuff */ grub_snprintf (valueend, 100, "%lld", grub_get_time_ms ()); valueend = grub_strchr (valueend, '\0'); /* do more slow stuff */ grub_snprintf (valueend, 100, "%lld", grub_get_time_ms ()); valueend = grub_strchr (valueend, '\0'); grub_env_set ("identifying_name", value); grub_free (value); } This seems lightweight enough that it doesn't interfere much with timings, and the approach of stuffing things into an environment variable is useful because you can get millisecond-precision timings even for video initialisation. With this approach, one of the most noticeable time sinks is that setting a graphical video mode (I'm using the VBE backend) takes ages: 1.6 seconds, which is a substantial percentage of this project's total boot time. It turns out that most of this is spent initialising double-buffering: doublebuf_pageflipping_init calls grub_video_fb_create_render_target_from_pointer twice, and each call takes a little over 600 milliseconds. Now, grub_video_fb_create_render_target_from_pointer is basically just a big grub_memset to clear framebuffer memory, so this equates to under two frames per second. What's going on? It turns out that write caching is disabled on video memory when GRUB is running, so we take a cache stall on every single write, and it's apparently hard to enable caching without implementing MTRRs. People who know more about this than I do tell me that this can get unpleasantly CPU-specific at times, although I still hold out some hope that it's possible in GRUB. However, there's a way to substantially speed things up without that. The naïve implementation of grub_memset writes a byte at a time, and for that matter on i386 it compiles to a poorly-optimised loop rather than using REP STOS or similar. grub_memset is an inner loop practically by definition, and it's worth optimising. We can fix both of these weaknesses by importing the optimised memset from GNU libc: since it writes four bytes at a time except (sometimes) at the start and end, it should take about a quarter the number of cache stalls. And, indeed, measurement bears this out: instead of taking over 600 milliseconds per call to grub_video_fb_create_render_target_from_pointer (I think it was actually 630 or so, though I neglected to write that down), GRUB now takes about 160 milliseconds per call. Much better! The optimised memset is LGPLv2.1 or later, and I've preserved that notice, but as far as I know this should be fine for use in GRUB; it can be upgraded to LGPLv3, and that's just GPLv3 with some additional permissions. It's already assigned to the FSF due to being in glibc. 2010-06-23 Colin Watson <cjwat...@ubuntu.com> * conf/any-emu.rmk (kernel_img_SOURCES): Add kern/string.c. * conf/common.rmk (grub_mkdevicemap_SOURCES): Likewise. (grub_probe_SOURCES): Likewise. (grub_fstest_SOURCES): Likewise. (grub_script_check_SOURCES): Likewise. (grub_editenv_SOURCES): Likewise. * conf/mips-qemu-mips.rmk (kernel_img_SOURCES): Likewise. * conf/mips-yeeloong.rmk (kernel_img_SOURCES): Likewise. * conf/powerpc-ieee1275.rmk (kernel_img_SOURCES): Likewise. * conf/sparc64-ieee1275.rmk (kernel_img_SOURCES): Likewise. (grub_setup_SOURCES): Likewise. * conf/tests.rmk (example_unit_test_SOURCES): Likewise. * conf/i386-coreboot.rmk (kernel_img_SOURCES): Add kern/i386/string.c. * conf/i386-ieee1275.rmk (kernel_img_SOURCES): Likewise. * conf/i386-multiboot.rmk (kernel_img_SOURCES): Likewise. * conf/i386-pc.rmk (kernel_img_SOURCES): Likewise. (grub_setup_SOURCES): Likewise. * conf/i386-qemu.rmk (kernel_img_SOURCES): Likewise. * conf/x86-efi.rmk (kernel_img_SOURCES): Likewise. * kern/i386/string.c: New file. * kern/misc.c (grub_memset): Move to ... * kern/string.c (grub_memset): ... here. * kern/misc.c (memset): Move to ... * kern/string.c (memset): ... here. === modified file 'conf/any-emu.rmk' --- conf/any-emu.rmk 2010-06-11 20:31:16 +0000 +++ conf/any-emu.rmk 2010-06-23 16:55:35 +0000 @@ -6,7 +6,7 @@ kernel_img_SOURCES = kern/device.c kern/ kern/err.c kern/list.c kern/command.c \ kern/corecmd.c kern/file.c kern/fs.c kern/main.c kern/misc.c \ kern/parser.c kern/partition.c kern/term.c \ - kern/rescue_reader.c kern/rescue_parser.c \ + kern/rescue_reader.c kern/rescue_parser.c kern/string.c \ \ kern/emu/main.c kern/emu/mm.c kern/emu/misc.c \ kern/emu/getroot.c kern/emu/time.c kern/emu/hostdisk.c \ === modified file 'conf/common.rmk' --- conf/common.rmk 2010-06-21 15:04:30 +0000 +++ conf/common.rmk 2010-06-23 21:04:23 +0000 @@ -7,7 +7,8 @@ sbin_UTILITIES += grub-mkdevicemap grub_mkdevicemap_SOURCES = gnulib/progname.c util/grub-mkdevicemap.c \ util/deviceiter.c \ util/misc.c kern/emu/misc.c \ - kern/env.c kern/err.c kern/list.c kern/misc.c kern/emu/mm.c + kern/env.c kern/err.c kern/list.c kern/misc.c kern/string.c \ + kern/emu/mm.c ifeq ($(target_cpu)-$(platform), sparc64-ieee1275) grub_mkdevicemap_SOURCES += util/ieee1275/ofpath.c util/ieee1275/devicemap.c @@ -27,7 +28,7 @@ util/grub-probe.c_DEPENDENCIES = grub_pr grub_probe_SOURCES = gnulib/progname.c util/grub-probe.c \ kern/emu/hostdisk.c util/misc.c kern/emu/misc.c kern/emu/getroot.c kern/emu/mm.c \ kern/device.c kern/disk.c kern/err.c kern/misc.c \ - kern/partition.c kern/file.c kern/list.c \ + kern/partition.c kern/file.c kern/list.c kern/string.c \ \ fs/affs.c fs/cpio.c fs/fat.c fs/ext2.c fs/hfs.c \ fs/hfsplus.c fs/iso9660.c fs/udf.c fs/jfs.c fs/minix.c \ @@ -49,6 +50,7 @@ util/grub-fstest.c_DEPENDENCIES = grub_f grub_fstest_SOURCES = gnulib/progname.c util/grub-fstest.c kern/emu/hostfs.c \ util/misc.c kern/emu/misc.c kern/emu/mm.c \ kern/file.c kern/device.c kern/disk.c kern/err.c kern/misc.c \ + kern/string.c \ disk/host.c disk/loopback.c kern/list.c kern/command.c \ lib/arg.c commands/extcmd.c normal/datetime.c normal/misc.c \ lib/hexdump.c lib/crc.c commands/blocklist.c commands/ls.c \ @@ -92,7 +94,7 @@ grub_script_check_SOURCES = gnulib/progn util/grub-script-check.c util/misc.c kern/emu/misc.c kern/emu/mm.c \ script/main.c script/script.c script/function.c script/lexer.c \ kern/err.c kern/list.c \ - kern/misc.c kern/env.c grub_script.tab.c \ + kern/misc.c kern/env.c kern/string.c grub_script.tab.c \ grub_script.yy.c grub_script_check_CFLAGS = $(GNULIB_UTIL_CFLAGS) grub_script_check_DEPENDENCIES = grub_script.tab.h @@ -160,7 +162,7 @@ DISTCLEANFILES += grub_fstest_init.c # for grub-editenv bin_UTILITIES += grub-editenv -grub_editenv_SOURCES = gnulib/progname.c util/grub-editenv.c lib/envblk.c util/misc.c kern/emu/misc.c kern/emu/mm.c kern/misc.c kern/err.c +grub_editenv_SOURCES = gnulib/progname.c util/grub-editenv.c lib/envblk.c util/misc.c kern/emu/misc.c kern/emu/mm.c kern/misc.c kern/err.c kern/string.c CLEANFILES += grub-editenv # Needed for genmk.rb to work === modified file 'conf/i386-coreboot.rmk' --- conf/i386-coreboot.rmk 2010-06-11 20:31:16 +0000 +++ conf/i386-coreboot.rmk 2010-06-23 16:58:20 +0000 @@ -16,7 +16,7 @@ kernel_img_SOURCES = kern/i386/coreboot/ kern/rescue_parser.c kern/rescue_reader.c \ kern/time.c kern/list.c kern/command.c kern/corecmd.c \ kern/$(target_cpu)/dl.c kern/parser.c kern/partition.c \ - kern/i386/tsc.c kern/i386/pit.c \ + kern/i386/tsc.c kern/i386/pit.c kern/i386/string.c \ kern/generic/rtc_get_time_ms.c \ kern/generic/millisleep.c \ kern/env.c \ === modified file 'conf/i386-ieee1275.rmk' --- conf/i386-ieee1275.rmk 2010-06-11 20:31:16 +0000 +++ conf/i386-ieee1275.rmk 2010-06-23 16:58:48 +0000 @@ -19,6 +19,7 @@ kernel_img_SOURCES = kern/i386/ieee1275/ kern/$(target_cpu)/dl.c kern/parser.c kern/partition.c \ kern/env.c \ kern/time.c kern/list.c kern/command.c kern/corecmd.c \ + kern/i386/string.c \ kern/generic/millisleep.c \ kern/ieee1275/ieee1275.c \ term/ieee1275/ofconsole.c \ === modified file 'conf/i386-multiboot.rmk' --- conf/i386-multiboot.rmk 2010-05-01 12:06:53 +0000 +++ conf/i386-multiboot.rmk 2010-06-23 16:58:56 +0000 @@ -18,7 +18,7 @@ kernel_img_SOURCES = kern/i386/coreboot/ kern/rescue_parser.c kern/rescue_reader.c \ kern/time.c kern/list.c kern/handler.c kern/command.c kern/corecmd.c \ kern/$(target_cpu)/dl.c kern/parser.c kern/partition.c \ - kern/i386/tsc.c kern/i386/pit.c \ + kern/i386/tsc.c kern/i386/pit.c kern/i386/string.c \ kern/generic/rtc_get_time_ms.c \ kern/generic/millisleep.c \ kern/env.c \ === modified file 'conf/i386-pc.rmk' --- conf/i386-pc.rmk 2010-06-12 11:17:28 +0000 +++ conf/i386-pc.rmk 2010-06-23 21:04:23 +0000 @@ -46,7 +46,7 @@ kernel_img_SOURCES = kern/i386/pc/startu kern/time.c kern/list.c kern/command.c kern/corecmd.c \ kern/$(target_cpu)/dl.c kern/i386/pc/init.c kern/i386/pc/mmap.c \ kern/parser.c kern/partition.c \ - kern/i386/tsc.c kern/i386/pit.c \ + kern/i386/tsc.c kern/i386/pit.c kern/i386/string.c \ kern/generic/rtc_get_time_ms.c \ kern/generic/millisleep.c \ kern/env.c \ @@ -66,7 +66,7 @@ util/i386/pc/grub-setup.c_DEPENDENCIES = grub_setup_SOURCES = gnulib/progname.c util/i386/pc/grub-setup.c \ util/misc.c kern/emu/misc.c kern/emu/getroot.c \ kern/emu/hostdisk.c kern/device.c kern/disk.c kern/err.c \ - kern/misc.c kern/partition.c kern/file.c \ + kern/misc.c kern/partition.c kern/file.c kern/i386/string.c \ kern/emu/mm.c kern/fs.c kern/env.c kern/list.c fs/fshelp.c \ \ fs/affs.c fs/cpio.c fs/ext2.c fs/fat.c fs/hfs.c \ === modified file 'conf/i386-qemu.rmk' --- conf/i386-qemu.rmk 2010-06-11 20:31:16 +0000 +++ conf/i386-qemu.rmk 2010-06-23 16:59:14 +0000 @@ -25,7 +25,7 @@ kernel_img_SOURCES = kern/i386/qemu/star kern/rescue_parser.c kern/rescue_reader.c \ kern/time.c kern/list.c kern/command.c kern/corecmd.c \ kern/$(target_cpu)/dl.c kern/parser.c kern/partition.c \ - kern/i386/tsc.c kern/i386/pit.c \ + kern/i386/tsc.c kern/i386/pit.c kern/i386/string.c \ kern/generic/rtc_get_time_ms.c \ kern/generic/millisleep.c \ kern/env.c \ === modified file 'conf/mips-qemu-mips.rmk' --- conf/mips-qemu-mips.rmk 2010-06-11 20:31:16 +0000 +++ conf/mips-qemu-mips.rmk 2010-06-23 16:59:28 +0000 @@ -11,7 +11,7 @@ kernel_img_SOURCES = kern/$(target_cpu)/ kern/$(target_cpu)/$(target_machine)/init.c \ kern/disk.c kern/dl.c kern/err.c kern/file.c kern/fs.c \ kern/misc.c kern/mm.c kern/term.c \ - kern/rescue_parser.c kern/rescue_reader.c \ + kern/rescue_parser.c kern/rescue_reader.c kern/string.c \ kern/list.c kern/command.c kern/corecmd.c \ kern/parser.c kern/partition.c kern/env.c kern/$(target_cpu)/dl.c \ kern/generic/millisleep.c kern/generic/rtc_get_time_ms.c kern/time.c \ === modified file 'conf/mips-yeeloong.rmk' --- conf/mips-yeeloong.rmk 2010-06-11 20:31:16 +0000 +++ conf/mips-yeeloong.rmk 2010-06-23 16:59:37 +0000 @@ -15,7 +15,7 @@ kernel_img_SOURCES = kern/$(target_cpu)/ kern/$(target_cpu)/$(target_machine)/init.c \ kern/disk.c kern/dl.c kern/err.c kern/file.c kern/fs.c \ kern/misc.c kern/mm.c kern/term.c \ - kern/rescue_parser.c kern/rescue_reader.c \ + kern/rescue_parser.c kern/rescue_reader.c kern/string.c \ kern/list.c kern/command.c kern/corecmd.c \ kern/parser.c kern/partition.c kern/env.c kern/$(target_cpu)/dl.c \ kern/generic/millisleep.c kern/generic/rtc_get_time_ms.c kern/time.c \ === modified file 'conf/powerpc-ieee1275.rmk' --- conf/powerpc-ieee1275.rmk 2010-06-11 20:31:16 +0000 +++ conf/powerpc-ieee1275.rmk 2010-06-23 16:59:58 +0000 @@ -12,7 +12,7 @@ kernel_img_SOURCES = kern/powerpc/ieee12 kern/ieee1275/ieee1275.c kern/main.c kern/device.c \ kern/disk.c kern/dl.c kern/err.c kern/file.c kern/fs.c \ kern/misc.c kern/mm.c kern/term.c \ - kern/rescue_parser.c kern/rescue_reader.c \ + kern/rescue_parser.c kern/rescue_reader.c kern/string.c \ kern/list.c kern/command.c kern/corecmd.c \ kern/ieee1275/init.c \ kern/ieee1275/mmap.c \ === modified file 'conf/sparc64-ieee1275.rmk' --- conf/sparc64-ieee1275.rmk 2010-06-11 20:31:16 +0000 +++ conf/sparc64-ieee1275.rmk 2010-06-23 17:00:18 +0000 @@ -25,7 +25,7 @@ kernel_img_SOURCES = kern/sparc64/ieee12 kern/ieee1275/ieee1275.c kern/main.c kern/device.c \ kern/disk.c kern/dl.c kern/err.c kern/file.c kern/fs.c \ kern/misc.c kern/mm.c kern/term.c \ - kern/rescue_parser.c kern/rescue_reader.c \ + kern/rescue_parser.c kern/rescue_reader.c kern/string.c \ kern/list.c kern/command.c kern/corecmd.c \ kern/sparc64/ieee1275/ieee1275.c \ kern/sparc64/ieee1275/init.c \ @@ -48,7 +48,7 @@ util/sparc64/ieee1275/grub-setup.c_DEPEN grub_setup_SOURCES = util/sparc64/ieee1275/grub-setup.c \ util/ieee1275/ofpath.c util/misc.c kern/emu/hostdisk.c \ kern/emu/misc.c kern/emu/getroot.c kern/emu/mm.c kern/device.c \ - kern/disk.c kern/err.c kern/misc.c \ + kern/disk.c kern/err.c kern/misc.c kern/string.c \ kern/partition.c kern/file.c kern/fs.c kern/env.c kern/list.c \ fs/fshelp.c \ \ === modified file 'conf/tests.rmk' --- conf/tests.rmk 2010-04-30 08:20:41 +0000 +++ conf/tests.rmk 2010-06-23 17:00:25 +0000 @@ -21,7 +21,7 @@ functional_test_mod_LDFLAGS = $(COMMON_L # Rules for unit tests check_UTILITIES += example_unit_test -example_unit_test_SOURCES = tests/example_unit_test.c kern/list.c kern/misc.c tests/lib/test.c tests/lib/unit_test.c +example_unit_test_SOURCES = tests/example_unit_test.c kern/list.c kern/misc.c kern/string.c tests/lib/test.c tests/lib/unit_test.c example_unit_test_CFLAGS = -Wno-format # Rules for functional tests === modified file 'conf/x86-efi.rmk' --- conf/x86-efi.rmk 2010-06-12 11:17:28 +0000 +++ conf/x86-efi.rmk 2010-06-23 21:04:23 +0000 @@ -26,7 +26,7 @@ kernel_img_SOURCES = kern/$(target_cpu)/ kern/env.c symlist.c kern/efi/efi.c kern/efi/init.c kern/efi/mm.c \ term/efi/console.c disk/efi/efidisk.c \ kern/time.c kern/list.c kern/command.c kern/corecmd.c \ - kern/i386/tsc.c kern/i386/pit.c \ + kern/i386/tsc.c kern/i386/pit.c kern/i386/string.c \ kern/generic/rtc_get_time_ms.c \ kern/generic/millisleep.c ifeq ($(target_cpu),x86_64) === added file 'kern/i386/string.c' --- kern/i386/string.c 1970-01-01 00:00:00 +0000 +++ kern/i386/string.c 2010-06-23 17:03:10 +0000 @@ -0,0 +1,96 @@ +/* + * GRUB -- GRand Unified Bootloader + * Copyright (C) 2002,2003,2004,2005,2006,2007,2008,2009,2010 Free Software Foundation, Inc. + * + * This optimised memset implementation originates in GNU libc. Its + * copyright and licensing information follow. + * + * Set a block of memory to some byte value. + * For Intel 80x86, x>=3. + * Copyright (C) 1991,1992,1993,1997,1998,2003, 2005 Free Software Foundation, Inc. + * This file is part of the GNU C Library. + * Contributed by Torbjorn Granlund (t...@sics.se). + * + * The GNU C Library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * The GNU C Library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with the GNU C Library; if not, write to the Free + * Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA + * 02111-1307 USA. + */ + +#include <grub/misc.h> + +#define OPSIZ (sizeof (unsigned long int)) + +void * +grub_memset (void *dstpp, int c, grub_size_t len) +{ + int d0; + unsigned long int dstp = (unsigned long int) dstpp; + + /* This explicit register allocation + improves code very much indeed. */ + register unsigned long int x asm("ax"); + + x = (unsigned char) c; + + /* Clear the direction flag, so filling will move forward. */ + asm volatile("cld"); + + /* This threshold value is optimal. */ + if (len >= 12) + { + /* Fill X with four copies of the char we want to fill with. */ + x |= (x << 8); + x |= (x << 16); + + /* Adjust LEN for the bytes handled in the first loop. */ + len -= (-dstp) % OPSIZ; + + /* There are at least some bytes to set. + No need to test for LEN == 0 in this alignment loop. */ + + /* Fill bytes until DSTP is aligned on a longword boundary. */ + asm volatile("rep\n" + "stosb" /* %0, %2, %3 */ : + "=D" (dstp), "=c" (d0) : + "0" (dstp), "1" ((-dstp) % OPSIZ), "a" (x) : + "memory"); + + /* Fill longwords. */ + asm volatile("rep\n" + "stosl" /* %0, %2, %3 */ : + "=D" (dstp), "=c" (d0) : + "0" (dstp), "1" (len / OPSIZ), "a" (x) : + "memory"); + len %= OPSIZ; + } + + /* Write the last few bytes. */ + asm volatile("rep\n" + "stosb" /* %0, %2, %3 */ : + "=D" (dstp), "=c" (d0) : + "0" (dstp), "1" (len), "a" (x) : + "memory"); + + return dstpp; +} + +#ifndef APPLE_CC +void *memset (void *s, int c, grub_size_t n) + __attribute__ ((alias ("grub_memset"))); +#else +void *memset (void *s, int c, grub_size_t n) +{ + return grub_memset (s, c, n); +} +#endif === modified file 'kern/misc.c' --- kern/misc.c 2010-05-28 13:48:45 +0000 +++ kern/misc.c 2010-06-23 17:01:49 +0000 @@ -506,26 +506,6 @@ grub_strndup (const char *s, grub_size_t return p; } -void * -grub_memset (void *s, int c, grub_size_t n) -{ - unsigned char *p = (unsigned char *) s; - - while (n--) - *p++ = (unsigned char) c; - - return s; -} -#ifndef APPLE_CC -void *memset (void *s, int c, grub_size_t n) - __attribute__ ((alias ("grub_memset"))); -#else -void *memset (void *s, int c, grub_size_t n) -{ - return grub_memset (s, c, n); -} -#endif - grub_size_t grub_strlen (const char *s) { === added file 'kern/string.c' --- kern/string.c 1970-01-01 00:00:00 +0000 +++ kern/string.c 2010-06-23 17:01:42 +0000 @@ -0,0 +1,42 @@ +/* string.c - definitions of string functions with architecture-specific + optimisations */ +/* + * GRUB -- GRand Unified Bootloader + * Copyright (C) 1999,2000,2001,2002,2003,2004,2005,2006,2007,2008,2009,2010 Free Software Foundation, Inc. + * + * GRUB is free software: you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation, either version 3 of the License, or + * (at your option) any later version. + * + * GRUB is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with GRUB. If not, see <http://www.gnu.org/licenses/>. + */ + +#include <grub/misc.h> + +void * +grub_memset (void *s, int c, grub_size_t n) +{ + unsigned char *p = (unsigned char *) s; + + while (n--) + *p++ = (unsigned char) c; + + return s; +} + +#ifndef APPLE_CC +void *memset (void *s, int c, grub_size_t n) + __attribute__ ((alias ("grub_memset"))); +#else +void *memset (void *s, int c, grub_size_t n) +{ + return grub_memset (s, c, n); +} +#endif Thanks, -- Colin Watson [cjwat...@ubuntu.com] _______________________________________________ Grub-devel mailing list Grub-devel@gnu.org http://lists.gnu.org/mailman/listinfo/grub-devel