http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54398

Carrot <carrot at google dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |carrot at google dot com

--- Comment #4 from Carrot <carrot at google dot com> 2012-09-07 01:19:31 UTC 
---
The code before the position Ahmad pointed out is already wrong.

The fault instruction sequence is:

       asrs    r5, r5, #1
       asr     ip, ip, #1      ; A, tmp1.x
       asrs    r0, r0, #1     ; B, tmp1.y
       asrs    r6, r6, #1
       mov     r4, r1
       add     r8, ip, r6      ; C, tmp3.x
       add     r9, r0, r5      ; D, tmp3.y
       add     r7, sp, #0
       asr     r1, r8, #1
       add     ip, r4, #8      ; E,
       asr     r9, r9, #1
       str     r1, [r7, #16]
       str     r9, [r7, #20]
       ldmia   r3, {r0, r1}    ; F,
       stmia   r4, {r0, r1}

Instruction A computes the result of tmp1.x, instruction C use it to compute
tmp3.x, instruction E overwrite the value of tmp1.x. But in the source code,
tmp1.x is still needed to execute "dst1->p2 = tmp1;", so at last dest1->p2.x
gets garbage.

Similarly instruction B computes tmp1.y, instruction D uses it to compute
tmp3.y, instruction F overwrites it. After executing "dst1->p2 = tmp1;",
dst1->p2.y gets another garbage value.


For comparison, following is the correct version

       asrs    r7, r7, #1    ; A, tmp1.x
       asrs    r0, r0, #1    ; B, tmp1.y
       asrs    r6, r6, #1
       asrs    r5, r5, #1
       sub     sp, sp, #28
       mov     r4, r1
       add     r8, r7, r6    ; C, tmp3.x
       add     ip, r0, r5    ; D, tmp3.y
       str     r7, [sp, #0]  ; X, save tmp1.x
       str     r0, [sp, #4]  ; Y, save tmp1.y
       asr     r1, ip, #1
       add     r7, r4, #8   ; E
       asr     r8, r8, #1
       str     r1, [sp, #20]
       str     r8, [sp, #16]
       ldmia   r3, {r0, r1}  ; F
       stmia   r4, {r0, r1}

The obvious difference is the extra instructions X and Y, they save the value
of tmp1 to stack before reusing the register.

The simplified preprocessed source code is


struct A
{
              int x;
              int y;

              void f(const A &a, const A &b)
              {
                      x = (a.x + b.x)>>1;
                      y = (a.y + b.y)>>1;
              }
};

class C {
public:
                A p1;
                A p2;
                A p3;

                bool b;
                void g(C *, C *) const;
};

void C::g(C *dst1, C *dst2) const
{
             A tmp1, tmp2, tmp3;

             tmp1.f(p2,p1);
             tmp2.f(p2,p3);
             tmp3.f(tmp1, tmp2);

             dst1->p1 = p1;
             dst1->p2 = tmp1;
             dst1->p3 =
             dst2->p1 = tmp3;
             dst2->p2 = tmp2;
             dst2->p3 = p3;
}

The simplified command line is:

./cc1plus -fpreprocessed t.ii -quiet -dumpbase t.cpp -mthumb "-march=armv7-a"
"-mtune=cortex-a15" -auxbase t -O2 -fno-omit-frame-pointer -o t.s

It looks like the dse2 pass did wrong transformation.

The gcc4.7 and trunk generate correct code.

Reply via email to