See also https://docs.google.com/file/d/0B6dMB5dovDUZRlhzdlZWTk9mTWc/edit?usp=sharing (this is gcc-optimize-bug.txt)
I have this relatively straighforward implementation of a a couple of pins worth of software PWM: void pwmcycle(void) { unsigned char pwm1, pwm2, pwm3, pwm4, pwm5, level_delay; unsigned char pwm_delay; getbright(); pwm1 = bright1; pwm2 = bright2; pwm3 = bright3; pwm4 = bright4; pwm5 = bright5; led_all_on(); for (pwm_delay = 128; pwm_delay !=0; pwm_delay--) { /* * Rather standard software PWM loop. */ if (--pwm1 == 0) { led1_off(); } if (--pwm2 == 0) { led2_off(); } if (--pwm3 == 0) { led3_off(); } if (--pwm4 == 0) { led4_off(); } if (--pwm5 == 0) { led5_off(); } } } When compiled with avr-gcc 4.6.2, it produces rather strange (but correct) code for the loop: /usr/local/CrossPack-AVR-20121207/bin/avr-gcc -c -mmcu=atmega8 -g -Os \ gcc-optimize-bug.c -save-temps=obj -o gcc-optimize-bug-Os.o c: 00 d0 rcall .+0 ; 0xe <pwmcycle+0xe> e: c0 91 00 00 lds r28, 0x0000 ;;pwm1 12: f0 90 00 00 lds r15, 0x0000 ;;pwm2 16: 00 91 00 00 lds r16, 0x0000 ;;pwm3 1a: 10 91 00 00 lds r17, 0x0000 ;;pwm4 1e: d0 91 00 00 lds r29, 0x0000 ;;pwm5 22: 00 d0 rcall .+0 ; 0x24 <pwmcycle+0x24> 24: 80 e8 ldi r24, 0x80 ; 128 26: e8 2e mov r14, r24 28: fc 1a sub r15, r28 2a: 0c 1b sub r16, r28 2c: 1c 1b sub r17, r28 2e: dc 1b sub r29, r28 30: c1 50 subi r28, 0x01 ; 1 32: 01 f4 brne .+0 ; 0x34 <pwmcycle+0x34> 34: 00 d0 rcall .+0 ; 0x36 <pwmcycle+0x36> 36: 8f 2d mov r24, r15 38: 8c 0f add r24, r28 3a: 01 f4 brne .+0 ; 0x3c <pwmcycle+0x3c> 3c: 00 d0 rcall .+0 ; 0x3e <pwmcycle+0x3e> 3e: 80 2f mov r24, r16 40: 8c 0f add r24, r28 42: 01 f4 brne .+0 ; 0x44 <pwmcycle+0x44> 44: 00 d0 rcall .+0 ; 0x46 <pwmcycle+0x46> : I guess this is some sort of loop optimization. I don't like that it's so obscured from the original, but it's also not very "good." I can get more obvious, and significantly smaller/faster code by turning off tree-loop-optimize: (note that -ftree-loop-optimize is turned ON by default starting at -O1) /usr/local/CrossPack-AVR-20121207/bin/avr-gcc -c -mmcu=atmega8 -g -Os \ gcc-optimize-bug.c -fno-tree-loop-optimize -save-temps=obj \ -o gcc-optimize-bug-notree.o c: 00 d0 rcall .+0 ; 0xe <pwmcycle+0xe> e: e0 90 00 00 lds r14, 0x0000 12: f0 90 00 00 lds r15, 0x0000 16: 00 91 00 00 lds r16, 0x0000 1a: 10 91 00 00 lds r17, 0x0000 1e: d0 91 00 00 lds r29, 0x0000 22: 00 d0 rcall .+0 ; 0x24 <pwmcycle+0x24> 24: c0 e8 ldi r28, 0x80 ; 128 26: ea 94 dec r14 28: 01 f4 brne .+0 ; 0x2a <pwmcycle+0x2a> 2a: 00 d0 rcall .+0 ; 0x2c <pwmcycle+0x2c> 2c: fa 94 dec r15 2e: 01 f4 brne .+0 ; 0x30 <pwmcycle+0x30> 30: 00 d0 rcall .+0 ; 0x32 <pwmcycle+0x32> 32: 01 50 subi r16, 0x01 ; 1 34: 01 f4 brne .+0 ; 0x36 <pwmcycle+0x36> 36: 00 d0 rcall .+0 ; 0x38 <pwmcycle+0x38> : I found http://gcc.gnu.org/onlinedocs/gccint/Tree-SSA-passes.html where they describe the optimizations done in tree_ssa_loop.c, which I assume is what is controlled here. Some of them sound useful. But it also looks like a case where high-level optimizations aimed at processors with vectorization capabilities (?) are making it difficult for code generators on smaller processors with the usual instruction sets to generate good code. Is there anything that can be done? Can vectorizing optimizations (if they turn out to be guilty) be turned off by processors that don't have any vectorization ability? Full source, intermediate, object, and list files on google docs. https://docs.google.com/file/d/0B6dMB5dovDUZRlhzdlZWTk9mTWc/edit?usp=sharing (FWIW, I get the same sort of non-optimal obfuscation using the msp430-gcc compiler, which is also based on 4.6.x)
_______________________________________________ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org https://lists.nongnu.org/mailman/listinfo/avr-gcc-list