Dear Wolfgang Denk, On 27 July 2013 03:06, Wolfgang Denk <w...@denx.de> wrote: > Dear Andy Green, > > In message > <caafg0w7p6rvvamqsksc09yqnfpod08ydtbeh_mqrz6n+b4i...@mail.gmail.com> you > wrote: >> >> > Instead of making assumptions on the performance of memcpy() and >> >> As I wrote, I measured the performance and got a very big gain, it's >> 3x faster on my setup to use memcpy() then default memmove(). > > Yes, in your single test case of copying a Linux kernel image, > i. e. a multi-megabyte file.
That "single test case" of copying a "multi-megabyte file" is the reason d'etre of a bootloader...if it doesn't perform well doing that then it's a problem. However I agree the alternative arch-specific implementation for ARM is pretty good, so this is kind of moot. Since I traced my problem to here I "fixed" it there, but that actual problem was were using the default implementations at all (the config was inherited).. >> By calling that an "assumption" you're saying that there exist >> platforms where 32-bit linear memmove() is slower than doing it with >> 8-bit actions? > > No. I said you should not assume that memcpy() is always faster than > memmove(); a system may use optiomized versions of either. I did not assume that, I looked at your code for both and saw, and proved, that using 32-bit operations for mass move actions is going to perform better than using 8-bit operations. That's not something you can write off as an "assumption", it's a fact. >> > adding the overhead of an additional function call (which can be >> > expensive especially for short copy operations) it would make more >> >> I am not sure U-Boot is really in the business of doing small >> memmoves, but okay... > > It's easy to avoid this overhead, and also get rid of the > restrictions you built into it (otimizong only the non-overlapping > case), so if we touch that code, we should do it right. Given that code should perferably never be used, maybe it should print a warning like "Using default memory ops" and leave it like it is. The problem is not correctness just inefficiency. >> > sense to pull the "copy a word at a time" code from memcpy() into >> > memmove(), too. >> > >> > On the other hand - if you really care about performance, then why do >> >> I spent several hours figuring out why our NOR boot performance was >> terrible.... it's because this default memmove code is gloriously >> inefficient for all cases. >> >> If you like it like that, no worries. > > Don't twist my words. I asked for a different, better implementation, > that's all. Unfortunately I'm only looking at it because it made trouble, and we now no longer use that code. For the use-case I'm studying (very fast overall boot) it still may make sense to implement the NEON stuff in which case I'll offer a patch for that. -Andy > > Best regards, > > Wolfgang Denk > > -- > DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel > HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany > Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: w...@denx.de > The only time the world beats a path to your door is when you are in > the bathroom. _______________________________________________ linaro-dev mailing list linaro-dev@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-dev