sped-up functions from lib1funcs.S: what about using more instructions and/or stack?

Wolfgang Hospital Tue, 14 May 2024 23:49:10 -0700

 Dear all,

G-J Lay has been kind enough to turn my whine about __udivmodqi4 into abug report and handle that; I tried to follow suit reporting furtherstrict improvements (NO resource used more, at least one used less).While I think bug keyword "missed-optimization" is for missingopportunities during compilation, I have no problem regarding strictlysub-optimal library code as a missed optimization.

But what about speed improvements that take more instructions and/orstack, or are slower for some argument values? Starting with a same size__mulqi3 faster for all multipliers but zero, for which it is slower, ora __mulhi3 with worst case about twice as fast, but 3 instructionslonger than the current code (both pointless for cores with mul,obviously). Or division routines: a faster one that is no larger"without movw", but uses one more return address on stack; one that is 2instructions smaller, a wee bit faster on average, but slower worstcase; one that's about 14 cycles faster, but 1 instruction longer?


How important is arithmetic for longer operands?

regards

W. Hospital

--
Wolfgang Hospital

sped-up functions from lib1funcs.S: what about using more instructions and/or stack?

Reply via email to