On 9/11/2013, at 12:08 am, Fredrik Olsson <pey...@gmail.com> wrote:

> I have this simple functions:
> int sum_vec(int c, ...) {
>    va_list argptr;
>    va_start(argptr, c);
>    int sum = 0;
>    while (c--) {
>        int x = va_arg(argptr, int);
>        sum += x;
>    }
>    va_end(argptr);
>    return sum;
> }
> 
> 
> When compiling with "-fomit-frame-pointer -Os -march=68000 -c -S
> -mshort" I get this assembly (I have manually added comments with
> clock cycles per instruction and a total for a count of 0, 8 and n>0):
>    .even
>    .globl _sum_vec
> _sum_vec:
>    lea (6,%sp),%a0         | 8
>    move.w 4(%sp),%d1       | 12
>    clr.w %d0               | 4
>    jra .L1                 | 12
> .L2:
>    add.w (%a0)+,%d0        | 8
> .L1:
>    dbra %d1,.L2            | 16,12
>    rts                     | 16
> | c==0: 8+12+4+12+12+16=64
> | c==8: 8+12+4+12+(16+8)*8+12+16=256
> | c==n: =64+24n
> 
> When instead compiling with "-fomit-frame-pointer -O3 -march=68000 -c
> -S -mshort" I expect to get more aggressive optimisation than -Os, or
> at least just as performant, but instead I get this:
>    .even
>    .globl _sum_vec
> _sum_vec:
>    move.w 4(%sp),%d0       | 12
>    jeq .L2                 | 12,8
>    lea (6,%sp),%a0         | 8
>    subq.w #1,%d0           | 4
>    and.l #65535,%d0        | 16
>    add.l %d0,%d0           | 8
>    lea 8(%sp,%d0.l),%a1    | 16
>    clr.w %d0               | 4
> .L1:
>    add.w (%a0)+,%d0        | 8
>    cmp.l %a0,%a1           | 8
>    jne .L1                 | 12|8
>    rts                     | 16
> .L2:
>    clr.w %d0               | 4
>    rts                     | 16
> | c==0: 12+12+4+16=44
> | c==8: 12+8+8+4+16+8+16+4+(8+8+12)*4-4+16=316
> | c==n: =88+28n
> 
> The count==0 case is better. I can see what optimisation has been
> tried for the loop, but it just not working since both the ini for the
> loop and the loop itself becomes more costly.
> 
> Being a GCC beginner I would like a few pointers as to how I should go
> about to fix this?

You investigate such problems by comparing intermediate debug dumps of two 
compilation scenarios; by the assembly time it is almost impossible to guess 
where the problem is coming from.  Add -fdump-tree-all and -fdump-rtl-all to 
the compilation flags and find which optimization pass makes the wrong 
decision.  Then you trace that optimization pass or file a bug report in hopes 
that someone (optimization maintainer) will look at it.

Read through GCC wiki for information on debugging and troubleshooting GCC:
- http://gcc.gnu.org/wiki/GettingStarted
- http://gcc.gnu.org/wiki/FAQ
- http://gcc.gnu.org/wiki/

Thanks,

--
Maxim Kuvyrkov
www.kugelworks.com



Reply via email to