Dear all,
G-J Lay has been kind enough to turn my whine about __udivmodqi4 into a
bug report and handle that; I tried to follow suit reporting further
strict improvements (NO resource used more, at least one used less).
While I think bug keyword "missed-optimization" is for missing
opportunities during compilation, I have no problem regarding strictly
sub-optimal library code as a missed optimization.
But what about speed improvements that take more instructions and/or
stack, or are slower for some argument values? Starting with a same size
__mulqi3 faster for all multipliers but zero, for which it is slower, or
a __mulhi3 with worst case about twice as fast, but 3 instructions
longer than the current code (both pointless for cores with mul,
obviously). Or division routines: a faster one that is no larger
"without movw", but uses one more return address on stack; one that is 2
instructions smaller, a wee bit faster on average, but slower worst
case; one that's about 14 cycles faster, but 1 instruction longer?
How important is arithmetic for longer operands?
regards
W. Hospital
--
Wolfgang Hospital