On Mon, Feb 02, 2015 at 07:28:14PM +0200, Pantelis Antoniou wrote: > Hi Tom, > > > On Feb 2, 2015, at 19:25 , Tom Rini <tr...@ti.com> wrote: > > > > On Sun, Feb 01, 2015 at 03:38:42AM +0100, Albert ARIBAUD wrote: > >> Hello Przemyslaw, > >> > >> On Wed, 28 Jan 2015 13:55:42 +0100, Przemyslaw Marczak > >> <p.marc...@samsung.com> wrote: > >>> For ARM architecture, enable the CONFIG_USE_ARCH_MEMSET/MEMCPY, > >>> will highly increase the memset/memcpy performance. This is able > >>> thanks to the ARM multiple register instructions. > >>> > >>> Unfortunatelly the relocation is done without the cache enabled, > >>> so it takes some time, but zeroing the BSS memory takes much more > >>> longer, especially for the configs with big static buffers. > >>> > >>> A quick test confirms, that the boot time improvement after using > >>> the arch memcpy for relocation has no significant meaning. > >>> The same test confirms that enable the memset for zeroing BSS, > >>> reduces the boot time. > >>> > >>> So this patch enables the arch memset for zeroing the BSS after > >>> the relocation process. For ARM boards, this can be enabled > >>> in board configs by defining: 'CONFIG_USE_ARCH_MEMSET'. > >> > >> Since the issue is that zeroing is done one word at a time, could we > >> not simply clear r3 as well as r2 (possibly even r4 and r5 too) and do > >> a double (possibly quadruple) write loop? That would avoid calling a > >> libc routine from the almost sole file in U-Boot where a C environment > >> is not necessarily granted. > > > > So this brings up something I've wondered about for a long while. We > > have arch/arm/lib/mem{set,cpy}.S which are old copies from the linux > > kernel. The kernel uses them for all ARM platforms. Why do we not > > always use these functions? I have a very vague notion it was a size > > thing… > > That is a good question. Are we being hobbled cause of MLO? If so we can > use the short (and slow) methods in that case and use the fast methods > in the normal case. It seems that this is warranted in this case.
I'm not sure, but I can test easily enough. But even then we may want to opt a few targets in to the current (slow) path and make the default the optimized path. > However in the particular case of dfu I think it’s best to avoid the large > static buffers. Or if we do use the large buffers let’s put them in a > linker segment that does not get zeroed on start. Yes, I owe the rest of the series my attention too :) -- Tom
signature.asc
Description: Digital signature
_______________________________________________ U-Boot mailing list U-Boot@lists.denx.de http://lists.denx.de/mailman/listinfo/u-boot