Dear all,

G-J Lay has been kind enough to turn my whine about __udivmodqi4 into a bug report and handle that; I tried to follow suit reporting further strict improvements (NO resource used more, at least one used less). While I think bug keyword "missed-optimization" is for missing opportunities during compilation, I have no problem regarding strictly sub-optimal library code as a missed optimization.

But what about speed improvements that take more instructions and/or stack, or are slower for some argument values? Starting with a same size __mulqi3 faster for all multipliers but zero, for which it is slower, or a __mulhi3 with worst case about twice as fast, but 3 instructions longer than the current code (both pointless for cores with mul, obviously). Or division routines: a faster one that is no larger "without movw", but uses one more return address on stack; one that is 2 instructions smaller, a wee bit faster on average, but slower worst case; one that's about 14 cycles faster, but 1 instruction longer?

How important is arithmetic for longer operands?

regards

W. Hospital

--
Wolfgang Hospital

Reply via email to