Re: Code optimization only in loops

Jean Christophe Beyler Wed, 13 May 2009 13:58:43 -0700

Ok, for the i386 port, I use uint32_t instead of uint64_t because
otherwise the assembly code generated is a bit complicated (I'm on a
32 bit machine).


The tree dump from final_cleanup are the same for the goo function:
goo (i)
{
<bb 2>:
  return data[i + 13] + data[i];

}


However, the first RTL dump from expand gives this for the i386 port:

(insn 6 5 7 3 ld.c:17 (parallel [
            (set (reg:SI 61)
                (plus:SI (reg/v:SI 59 [ i ])
                    (const_int 13 [0xd])))
            (clobber (reg:CC 17 flags))
        ]) -1 (nil))

(insn 7 6 8 3 ld.c:17 (set (reg/f:SI 62)
        (symbol_ref:SI ("data") <var_decl 0xb7e7ce60 data>)) -1 (nil))

(insn 8 7 9 3 ld.c:17 (set (reg/f:SI 63)
        (symbol_ref:SI ("data") <var_decl 0xb7e7ce60 data>)) -1 (nil))

(insn 9 8 10 3 ld.c:17 (set (reg:SI 64)
        (mem/s:SI (plus:SI (mult:SI (reg/v:SI 59 [ i ])
                    (const_int 4 [0x4]))
                (reg/f:SI 63)) [3 data S4 A32])) -1 (nil))

(insn 10 9 11 3 ld.c:17 (set (reg:SI 65)
        (mem/s:SI (plus:SI (mult:SI (reg:SI 61)
                    (const_int 4 [0x4]))
                (reg/f:SI 62)) [3 data S4 A32])) -1 (nil))

(insn 11 10 12 3 ld.c:17 (parallel [
            (set (reg:SI 60)
                (plus:SI (reg:SI 65)
                    (reg:SI 64)))
            (clobber (reg:CC 17 flags))
        ]) -1 (expr_list:REG_EQUAL (plus:SI (mem/s:SI (plus:SI
(mult:SI (reg:SI 61)
                        (const_int 4 [0x4]))
                    (reg/f:SI 62)) [3 data S4 A32])
            (mem/s:SI (plus:SI (mult:SI (reg/v:SI 59 [ i ])
                        (const_int 4 [0x4]))
                    (reg/f:SI 63)) [3 data S4 A32]))
        (nil)))

As we can see, the compiler moves 13, and the @ of data, then
muliplies the 13 with 4 to get the right size and then performs the 2
loads and finally has a plus.

In my port, I get:

(insn 6 5 7 3 ld.c:17 (set (reg:DI 75)
        (plus:DI (reg/v:DI 73 [ i ])
            (const_int 13 [0xd]))) -1 (nil))

(insn 7 6 8 3 ld.c:17 (set (reg/f:DI 76)
        (symbol_ref:DI ("data") <var_decl 0xb7c85bb0 data>)) -1 (nil))

(insn 8 7 9 3 ld.c:17 (set (reg:DI 78)
        (const_int 3 [0x3])) -1 (nil))

(insn 9 8 10 3 ld.c:17 (set (reg:DI 77)
        (ashift:DI (reg:DI 75)
            (reg:DI 78))) -1 (nil))

(insn 10 9 11 3 ld.c:17 (set (reg/f:DI 79)
        (plus:DI (reg/f:DI 76)
            (reg:DI 77))) -1 (nil))

(insn 11 10 12 3 ld.c:17 (set (reg/f:DI 80)
        (symbol_ref:DI ("data") <var_decl 0xb7c85bb0 data>)) -1 (nil))

(insn 12 11 13 3 ld.c:17 (set (reg:DI 82)
        (const_int 3 [0x3])) -1 (nil))

(insn 13 12 14 3 ld.c:17 (set (reg:DI 81)
        (ashift:DI (reg/v:DI 73 [ i ])
            (reg:DI 82))) -1 (nil))

(insn 14 13 15 3 ld.c:17 (set (reg/f:DI 83)
        (plus:DI (reg/f:DI 80)
            (reg:DI 81))) -1 (nil))

(insn 15 14 16 3 ld.c:17 (set (reg:DI 84)
        (mem/s:DI (reg/f:DI 79) [2 data S8 A64])) -1 (nil))

(insn 16 15 17 3 ld.c:17 (set (reg:DI 85)
        (mem/s:DI (reg/f:DI 83) [2 data S8 A64])) -1 (nil))

(insn 17 16 18 3 ld.c:17 (set (reg:DI 74)
        (plus:DI (reg:DI 84)
            (reg:DI 85))) -1 (nil))


Which seems to be the same idea, except that constant 3 gets load up
and a shift is performed. Is it possible that it's that that is
causing my problem in code generation?

I'm trying to figure out why my port is generating a shift instead of
simply a mult. I actually changed the cost of shift to a large value
and then it uses adds instead of simply a mult. I seem to think that
this is then an rtx_cost problem where I'm not telling the compiler
that a multiplication in this case is correct.

I've been playing with rtx_cost but have been unable to really get it
to generate the right code.

Thanks again for your help and insight,
Jc

On Fri, May 8, 2009 at 5:18 AM, Paolo Bonzini <bonz...@gnu.org> wrote:
>
>> It seems that when set in a loop, the program is able to perform some
>> type of optimization to actually get the use of the offsets where as
>> in the case of no loop, we have twice the calculations of instructions
>> for each address calculations.
>
> I suggest you look at the dumps for i386 to see which pass does the
> changes, and then see what happens in your port.
>
> Paolo
>

Re: Code optimization only in loops

Reply via email to