On 06/23/2010 11:38 PM, Colin Watson wrote: > With this approach, one of the most noticeable time sinks is that > setting a graphical video mode (I'm using the VBE backend) takes ages: > 1.6 seconds, which is a substantial percentage of this project's total > boot time. It turns out that most of this is spent initialising > double-buffering: doublebuf_pageflipping_init calls > grub_video_fb_create_render_target_from_pointer twice, and each call > takes a little over 600 milliseconds. Now, > grub_video_fb_create_render_target_from_pointer is basically just a big > grub_memset to clear framebuffer memory, so this equates to under two > frames per second. What's going on? > > It turns out that write caching is disabled on video memory when GRUB is > running, so we take a cache stall on every single write, and it's > apparently hard to enable caching without implementing MTRRs. People > who know more about this than I do tell me that this can get > unpleasantly CPU-specific at times, although I still hold out some hope > that it's possible in GRUB. > > On non-device memory GRUB should take advantage of cache. On MIPS enabling/disabling cache is done by using a different address. So we have all infrastructure necessary for differentiating cacheable/non-cacheable is present. Enabling cache on video memory is however more of a trouble. One of the reasons is that cache nmishandling produces difficult bugs. > However, there's a way to substantially speed things up without that. > The naïve implementation of grub_memset writes a byte at a time, and for > that matter on i386 it compiles to a poorly-optimised loop rather than > using REP STOS or similar. grub_memset is an inner loop practically by > definition, and it's worth optimising. We can fix both of these > weaknesses by importing the optimised memset from GNU libc: since it > writes four bytes at a time except (sometimes) at the start and end, it > should take about a quarter the number of cache stalls. And, indeed, > measurement bears this out: instead of taking over 600 milliseconds per > call to grub_video_fb_create_render_target_from_pointer (I think it was > actually 630 or so, though I neglected to write that down), GRUB now > takes about 160 milliseconds per call. Much better! > > The optimised memset is LGPLv2.1 or later, and I've preserved that > notice, but as far as I know this should be fine for use in GRUB; it can > be upgraded to LGPLv3, and that's just GPLv3 with some additional > permissions. It's already assigned to the FSF due to being in glibc. > > It's ok to use this code but be sure to mention its origin. It's also ok to keep its license unless big divergeance is to be expected.
Did you test it on x86_64? > +void * > +grub_memset (void *s, int c, grub_size_t n) > +{ > + unsigned char *p = (unsigned char *) s; > + > + while (n--) > + *p++ = (unsigned char) c; > + > + return s; > +} > This can be optimised the same way as i386 part, just replace stos with a loop over iterator with a pointer aligned on its size. > Thanks, > > -- Regards Vladimir 'φ-coder/phcoder' Serbinenko
signature.asc
Description: OpenPGP digital signature
_______________________________________________ Grub-devel mailing list Grub-devel@gnu.org http://lists.gnu.org/mailman/listinfo/grub-devel