Dnia Sun, Apr 21, 2024 at 10:08:11AM +0200, Wolfgang Hospital napisał(a): > I've been tinkering around, the "ldi r_cnt, 9""rjmp entry point" in > __udivmodqi4 instead of "ldi r_cnt, 8""lsl r_arg1" annoying me for > years. (Biggest relative strict improvement I found, FWIW.)
Number of loop iterations is constant, so you could actually unroll the loop to make it even faster (and larger, that may no longer be an issue if you have enough program memory - back when most of that code was written, choice of AVR chips with large flash was much more limited, so it was more important to optimize size even at the cost of speed). There could be some win in size/speed of the code calling __udivmodqi4 from not clobbering r23 (r_cnt no longer needed) but my memory is vague what is the correct way to tell GCC about it (last time I worked on it was 2005 or so). Actually, the original reason I started hacking avr-gcc around 1999 was that I had a project to finish that barely fit in 8K flash (AT90S8535), next larger device had 128K (ATmega103) but would require complete board re-design, and the new pin-compatible ATmega163 came out much later. Marek