On Tue, 30 Aug 2005, Knut Petersen wrote: > > Probably you can make it even faster by avoiding the multiplication, like > > > > unsigned int offset = 0; > > for (i = 0; i < image.height; i++) { > > dst[offset] = src[i]; > > offset += pitch; > > } > > More than two decades ago I learned to avoid mul and imul. Use shifts, add and > lea instead, > that was the credo those days. The name of the game was CP/M 80/86, a86, d86 > and ddt ;-) > > But let�s get serious again.
On modern CPUs, a multiplication indeed takes 1 cycle, just like an addition. But on older CPUs (still supported by Linux), this is not true. > Your proposed change of the patch results in a 21 ms performance decrease on > my system. > Yes, I do know that this is hard to believe. I tested a similar variation > before, and the results > were even worse. > > Avoiding mul is a good idea in assembly language today, but often it is better > to write a > multiplication with the loop counter in C and not to introduce an extra > variable instead. The > compiler will optimize the code and it�s easier for gcc without that extra > variable. But you are right. On actual inspection of the generated assembly code for a very simple test case, it turns out both (m68k-linux-)gcc 2.95.2 and 3.3.3 are smart enough to convert the multiplication to an addition... And interestingly, if I avoid the multiplication explicitly, gcc 2.95.2 still generates the same code, but 3.3.3 adds a few extra instructions to save/restore local vars. So this probably explains why it turned out to be slower for you. Ugh... Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [EMAIL PROTECTED] In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds