Georg-Johann Lay wrote: > Paulo Marques schrieb: >> Hi, all >> >> Some time ago I toyed with trying to build a super-optimizer for the >> avr. The project didn't went very far, but it produced a few small bits >> of code that I think are worth sharing. >> >> Basically I was trying to produce functions to multiply an 8 bit value >> by 8 bit constants with a 16 bit result in avr families without a mul >> instruction. > > Hi Paulo,
Hi, Georg > can your optimizer produce results for other arithmetic than > multiplication? Yes, with a few tweaks. > For example to get a sequence for > > R >>>= 2 > > where R is an 8-bit register and >>> stands for "rotate right". > There is a 5-insn sequence > > SWAP R > LSL R > ADC R,Z > LSL R > ADC R,Z > > where Z means "a register containing 0", i.e. >>> 2 is performed as > <<< 4 > <<< 1 > <<< 1 > > With MUL instruction there is > > LDI U,64 > MUL R,U > OR R0,R1 > MOV R,R0 > > but that needs an upper register U and for avr-gcc the zero-register R1 > must be cleared: > > CLR Z > > so that this sequence is not shorter but only slower. The question is > if there exists a shorter sequence and maybe your tool knows some magic > to give the answer? The tool can not give a definite answer, because it only uses a subset of all available instructions (and it might have bugs). I tweaked it for the 2 bit rotation, and with the subset I usually use it couldn't find any sequence better than yours: the best sequences used 5 instructions and 2 registers. I then increased the subset of instructions to give it more options, but it still could not find a smaller sequence... :( It did found a lot of 5 instruction sequences. Here are a few examples: MOV R2, R ASR R2 ROR R ASR R2 ROR R This also uses an extra register, but doesn't require it to be initialized to zero. Since we have a zero register available most of the time, this is really no advantage over your version. ADD R, R ADC R, Z ADD R, R ADC R, Z SWAP R This is basically the same as yours, although the SWAP being done in the end makes the code harder to follow ;) I've attached the output of the optimizer. The R1 register is assumed to have zero when starting and R0 has the value to be shifted. As you can see many of the 81 sequences are equivalent. If you want me to try different problems, just let me know. I'm always curious about what kind of weird sequences the optimizer will produce :) -- Paulo Marques Software Development Department - Grupo PIE, S.A. Phone: +351 252 290600, Fax: +351 252 290601 Web: www.grupopie.com I have not yet begun to procrastinate
ADC R0 R0 ADC R0 R1 ADC R0 R0 ADC R0 R1 SWAP R0 ADC R0 R0 ADC R0 R1 ADD R0 R0 ADC R0 R1 SWAP R0 ADC R0 R0 ADC R0 R1 SWAP R0 ADC R0 R0 ADC R0 R1 ADC R0 R0 ADC R0 R1 SWAP R0 ADD R0 R0 ADC R0 R1 ADC R1 R0 ASR R1 R1 ROR R0 R1 ASR R1 R1 ROR R0 R1 ADC R1 R0 ASR R1 R1 ROR R0 R1 LSR R1 R1 ROR R0 R1 ADC R1 R0 ASR R1 R1 ROR R0 R1 ROR R1 R1 ROR R0 R1 ADC R1 R0 LSR R1 R1 ROR R0 R1 ASR R1 R1 ROR R0 R1 ADC R1 R0 LSR R1 R1 ROR R0 R1 LSR R1 R1 ROR R0 R1 ADC R1 R0 LSR R1 R1 ROR R0 R1 ROR R1 R1 ROR R0 R1 ADC R1 R0 ROR R1 R1 ROR R0 R1 ASR R1 R1 ROR R0 R1 ADC R1 R0 ROR R1 R1 ROR R0 R1 LSR R1 R1 ROR R0 R1 ADC R1 R0 ROR R1 R1 ROR R0 R1 ROR R1 R1 ROR R0 R1 . ADD R0 R0 ADC R0 R1 ADC R0 R0 ADC R0 R1 SWAP R0 ADD R0 R0 ADC R0 R1 ADD R0 R0 ADC R0 R1 SWAP R0 ADD R0 R0 ADC R0 R1 SWAP R0 ADC R0 R0 ADC R0 R1 ADD R0 R0 ADC R0 R1 SWAP R0 ADD R0 R0 ADC R0 R1 ADD R1 R0 ASR R1 R1 ROR R0 R1 ASR R1 R1 ROR R0 R1 ADD R1 R0 ASR R1 R1 ROR R0 R1 LSR R1 R1 ROR R0 R1 ADD R1 R0 ASR R1 R1 ROR R0 R1 ROR R1 R1 ROR R0 R1 ADD R1 R0 LSR R1 R1 ROR R0 R1 ASR R1 R1 ROR R0 R1 ADD R1 R0 LSR R1 R1 ROR R0 R1 LSR R1 R1 ROR R0 R1 ADD R1 R0 LSR R1 R1 ROR R0 R1 ROR R1 R1 ROR R0 R1 ADD R1 R0 ROR R1 R1 ROR R0 R1 ASR R1 R1 ROR R0 R1 ADD R1 R0 ROR R1 R1 ROR R0 R1 LSR R1 R1 ROR R0 R1 ADD R1 R0 ROR R1 R1 ROR R0 R1 ROR R1 R1 ROR R0 R1 .. EOR R1 R0 ASR R1 R1 ROR R0 R1 ASR R1 R1 ROR R0 R1 EOR R1 R0 ASR R1 R1 ROR R0 R1 LSR R1 R1 ROR R0 R1 EOR R1 R0 ASR R1 R1 ROR R0 R1 ROR R1 R1 ROR R0 R1 EOR R1 R0 LSR R1 R1 ROR R0 R1 ASR R1 R1 ROR R0 R1 EOR R1 R0 LSR R1 R1 ROR R0 R1 LSR R1 R1 ROR R0 R1 EOR R1 R0 LSR R1 R1 ROR R0 R1 ROR R1 R1 ROR R0 R1 EOR R1 R0 ROR R1 R1 ROR R0 R1 ASR R1 R1 ROR R0 R1 EOR R1 R0 ROR R1 R1 ROR R0 R1 LSR R1 R1 ROR R0 R1 EOR R1 R0 ROR R1 R1 ROR R0 R1 ROR R1 R1 ROR R0 R1 MOV R1 R0 ASR R1 R1 ROR R0 R1 ASR R1 R1 ROR R0 R1 MOV R1 R0 ASR R1 R1 ROR R0 R1 LSR R1 R1 ROR R0 R1 MOV R1 R0 ASR R1 R1 ROR R0 R1 ROR R1 R1 ROR R0 R1 MOV R1 R0 LSR R1 R1 ROR R0 R1 ASR R1 R1 ROR R0 R1 MOV R1 R0 LSR R1 R1 ROR R0 R1 LSR R1 R1 ROR R0 R1 MOV R1 R0 LSR R1 R1 ROR R0 R1 ROR R1 R1 ROR R0 R1 MOV R1 R0 ROR R1 R1 ROR R0 R1 ASR R1 R1 ROR R0 R1 MOV R1 R0 ROR R1 R1 ROR R0 R1 LSR R1 R1 ROR R0 R1 MOV R1 R0 ROR R1 R1 ROR R0 R1 ROR R1 R1 ROR R0 R1 OR R1 R0 ASR R1 R1 ROR R0 R1 ASR R1 R1 ROR R0 R1 OR R1 R0 ASR R1 R1 ROR R0 R1 LSR R1 R1 ROR R0 R1 OR R1 R0 ASR R1 R1 ROR R0 R1 ROR R1 R1 ROR R0 R1 OR R1 R0 LSR R1 R1 ROR R0 R1 ASR R1 R1 ROR R0 R1 OR R1 R0 LSR R1 R1 ROR R0 R1 LSR R1 R1 ROR R0 R1 OR R1 R0 LSR R1 R1 ROR R0 R1 ROR R1 R1 ROR R0 R1 OR R1 R0 ROR R1 R1 ROR R0 R1 ASR R1 R1 ROR R0 R1 OR R1 R0 ROR R1 R1 ROR R0 R1 LSR R1 R1 ROR R0 R1 OR R1 R0 ROR R1 R1 ROR R0 R1 ROR R1 R1 ROR R0 R1 .. LSR R0 R1 ROR R1 R1 ASR R0 R1 ROR R1 R1 ADC R0 R1 LSR R0 R1 ROR R1 R1 ASR R0 R1 ROR R1 R1 ADD R0 R1 LSR R0 R1 ROR R1 R1 ASR R0 R1 ROR R1 R1 EOR R0 R1 LSR R0 R1 ROR R1 R1 ASR R0 R1 ROR R1 R1 OR R0 R1 LSR R0 R1 ROR R1 R1 LSR R0 R1 ROR R1 R1 ADC R0 R1 LSR R0 R1 ROR R1 R1 LSR R0 R1 ROR R1 R1 ADD R0 R1 LSR R0 R1 ROR R1 R1 LSR R0 R1 ROR R1 R1 EOR R0 R1 LSR R0 R1 ROR R1 R1 LSR R0 R1 ROR R1 R1 OR R0 R1 LSR R0 R1 ROR R1 R1 ROR R0 R1 ROR R1 R1 ADC R0 R1 LSR R0 R1 ROR R1 R1 ROR R0 R1 ROR R1 R1 ADD R0 R1 LSR R0 R1 ROR R1 R1 ROR R0 R1 ROR R1 R1 EOR R0 R1 LSR R0 R1 ROR R1 R1 ROR R0 R1 ROR R1 R1 OR R0 R1 . ROR R0 R1 ROR R1 R1 ASR R0 R1 ROR R1 R1 ADC R0 R1 ROR R0 R1 ROR R1 R1 ASR R0 R1 ROR R1 R1 ADD R0 R1 ROR R0 R1 ROR R1 R1 ASR R0 R1 ROR R1 R1 EOR R0 R1 ROR R0 R1 ROR R1 R1 ASR R0 R1 ROR R1 R1 OR R0 R1 ROR R0 R1 ROR R1 R1 LSR R0 R1 ROR R1 R1 ADC R0 R1 ROR R0 R1 ROR R1 R1 LSR R0 R1 ROR R1 R1 ADD R0 R1 ROR R0 R1 ROR R1 R1 LSR R0 R1 ROR R1 R1 EOR R0 R1 ROR R0 R1 ROR R1 R1 LSR R0 R1 ROR R1 R1 OR R0 R1 ROR R0 R1 ROR R1 R1 ROR R0 R1 ROR R1 R1 ADC R0 R1 ROR R0 R1 ROR R1 R1 ROR R0 R1 ROR R1 R1 ADD R0 R1 ROR R0 R1 ROR R1 R1 ROR R0 R1 ROR R1 R1 EOR R0 R1 ROR R0 R1 ROR R1 R1 ROR R0 R1 ROR R1 R1 OR R0 R1 SWAP R0 ADC R0 R0 ADC R0 R1 ADC R0 R0 ADC R0 R1 SWAP R0 ADC R0 R0 ADC R0 R1 ADD R0 R0 ADC R0 R1 SWAP R0 ADD R0 R0 ADC R0 R1 ADC R0 R0 ADC R0 R1 SWAP R0 ADD R0 R0 ADC R0 R1 ADD R0 R0 ADC R0 R1
_______________________________________________ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org https://lists.nongnu.org/mailman/listinfo/avr-gcc-list