[avr-gcc-list] avr-gcc sub-optimal code with -ftree-loop-optimize - fixable?

Bill Westfield Fri, 29 Mar 2013 21:28:44 -0700

See also
https://docs.google.com/file/d/0B6dMB5dovDUZRlhzdlZWTk9mTWc/edit?usp=sharing
(this is gcc-optimize-bug.txt)


I have this relatively straighforward implementation of a a couple of pins
worth of software PWM:

void pwmcycle(void)
{
    unsigned char pwm1, pwm2, pwm3, pwm4, pwm5, level_delay;
    unsigned char pwm_delay;

    getbright();
    pwm1 = bright1;
    pwm2 = bright2;
    pwm3 = bright3;
    pwm4 = bright4;
    pwm5 = bright5;
    led_all_on();
    for (pwm_delay = 128; pwm_delay !=0; pwm_delay--) {
    /*
     * Rather standard software PWM loop.
     */
    if (--pwm1 == 0) {
        led1_off();
    }
    if (--pwm2 == 0) {
        led2_off();
    }
    if (--pwm3 == 0) {
        led3_off();
    }
    if (--pwm4 == 0) {
        led4_off();
    }
    if (--pwm5 == 0) {
        led5_off();
    }
    }
}

When compiled with avr-gcc 4.6.2, it produces rather strange (but correct)
code
for the loop:

/usr/local/CrossPack-AVR-20121207/bin/avr-gcc -c -mmcu=atmega8 -g -Os \
          gcc-optimize-bug.c  -save-temps=obj -o gcc-optimize-bug-Os.o

   c:    00 d0           rcall    .+0          ; 0xe <pwmcycle+0xe>
   e:    c0 91 00 00     lds    r28, 0x0000   ;;pwm1
  12:    f0 90 00 00     lds    r15, 0x0000   ;;pwm2
  16:    00 91 00 00     lds    r16, 0x0000   ;;pwm3
  1a:    10 91 00 00     lds    r17, 0x0000   ;;pwm4
  1e:    d0 91 00 00     lds    r29, 0x0000   ;;pwm5
  22:    00 d0           rcall    .+0          ; 0x24 <pwmcycle+0x24>
  24:    80 e8           ldi    r24, 0x80    ; 128
  26:    e8 2e           mov    r14, r24
  28:    fc 1a           sub    r15, r28
  2a:    0c 1b           sub    r16, r28
  2c:    1c 1b           sub    r17, r28
  2e:    dc 1b           sub    r29, r28
  30:    c1 50           subi    r28, 0x01    ; 1
  32:    01 f4           brne    .+0          ; 0x34 <pwmcycle+0x34>
  34:    00 d0           rcall    .+0          ; 0x36 <pwmcycle+0x36>
  36:    8f 2d           mov    r24, r15
  38:    8c 0f           add    r24, r28
  3a:    01 f4           brne    .+0          ; 0x3c <pwmcycle+0x3c>
  3c:    00 d0           rcall    .+0          ; 0x3e <pwmcycle+0x3e>
  3e:    80 2f           mov    r24, r16
  40:    8c 0f           add    r24, r28
  42:    01 f4           brne    .+0          ; 0x44 <pwmcycle+0x44>
  44:    00 d0           rcall    .+0          ; 0x46 <pwmcycle+0x46>
       :

I guess this is some sort of loop optimization.  I don't like that it's so
obscured from the original, but it's also not very "good."  I can get more
obvious, and significantly smaller/faster code by turning off
tree-loop-optimize:
(note that -ftree-loop-optimize is turned ON by default starting at -O1)

/usr/local/CrossPack-AVR-20121207/bin/avr-gcc -c -mmcu=atmega8 -g -Os \
          gcc-optimize-bug.c  -fno-tree-loop-optimize -save-temps=obj \
          -o gcc-optimize-bug-notree.o

   c:    00 d0           rcall    .+0          ; 0xe <pwmcycle+0xe>
   e:    e0 90 00 00     lds    r14, 0x0000
  12:    f0 90 00 00     lds    r15, 0x0000
  16:    00 91 00 00     lds    r16, 0x0000
  1a:    10 91 00 00     lds    r17, 0x0000
  1e:    d0 91 00 00     lds    r29, 0x0000
  22:    00 d0           rcall    .+0          ; 0x24 <pwmcycle+0x24>
  24:    c0 e8           ldi    r28, 0x80    ; 128
  26:    ea 94           dec    r14
  28:    01 f4           brne    .+0          ; 0x2a <pwmcycle+0x2a>
  2a:    00 d0           rcall    .+0          ; 0x2c <pwmcycle+0x2c>
  2c:    fa 94           dec    r15
  2e:    01 f4           brne    .+0          ; 0x30 <pwmcycle+0x30>
  30:    00 d0           rcall    .+0          ; 0x32 <pwmcycle+0x32>
  32:    01 50           subi    r16, 0x01    ; 1
  34:    01 f4           brne    .+0          ; 0x36 <pwmcycle+0x36>
  36:    00 d0           rcall    .+0          ; 0x38 <pwmcycle+0x38>
     :


I found http://gcc.gnu.org/onlinedocs/gccint/Tree-SSA-passes.html where they
describe the optimizations done in tree_ssa_loop.c, which I assume is what
is controlled here.  Some of them sound useful.  But it also looks like a
case where high-level optimizations aimed at processors with vectorization
capabilities (?) are making it difficult for code generators on smaller
processors with the usual instruction sets to generate good code.  Is there
anything that can be done?  Can vectorizing optimizations (if they turn out
to be guilty) be turned off by processors that don't have any vectorization
ability?

Full source, intermediate, object, and list files on google docs.
https://docs.google.com/file/d/0B6dMB5dovDUZRlhzdlZWTk9mTWc/edit?usp=sharing

(FWIW, I get the same sort of non-optimal obfuscation using the msp430-gcc
compiler,
which is also based on 4.6.x)

_______________________________________________
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
https://lists.nongnu.org/mailman/listinfo/avr-gcc-list

[avr-gcc-list] avr-gcc sub-optimal code with -ftree-loop-optimize - fixable?

Reply via email to