http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54398
Carrot <carrot at google dot com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |carrot at google dot com --- Comment #4 from Carrot <carrot at google dot com> 2012-09-07 01:19:31 UTC --- The code before the position Ahmad pointed out is already wrong. The fault instruction sequence is: asrs r5, r5, #1 asr ip, ip, #1 ; A, tmp1.x asrs r0, r0, #1 ; B, tmp1.y asrs r6, r6, #1 mov r4, r1 add r8, ip, r6 ; C, tmp3.x add r9, r0, r5 ; D, tmp3.y add r7, sp, #0 asr r1, r8, #1 add ip, r4, #8 ; E, asr r9, r9, #1 str r1, [r7, #16] str r9, [r7, #20] ldmia r3, {r0, r1} ; F, stmia r4, {r0, r1} Instruction A computes the result of tmp1.x, instruction C use it to compute tmp3.x, instruction E overwrite the value of tmp1.x. But in the source code, tmp1.x is still needed to execute "dst1->p2 = tmp1;", so at last dest1->p2.x gets garbage. Similarly instruction B computes tmp1.y, instruction D uses it to compute tmp3.y, instruction F overwrites it. After executing "dst1->p2 = tmp1;", dst1->p2.y gets another garbage value. For comparison, following is the correct version asrs r7, r7, #1 ; A, tmp1.x asrs r0, r0, #1 ; B, tmp1.y asrs r6, r6, #1 asrs r5, r5, #1 sub sp, sp, #28 mov r4, r1 add r8, r7, r6 ; C, tmp3.x add ip, r0, r5 ; D, tmp3.y str r7, [sp, #0] ; X, save tmp1.x str r0, [sp, #4] ; Y, save tmp1.y asr r1, ip, #1 add r7, r4, #8 ; E asr r8, r8, #1 str r1, [sp, #20] str r8, [sp, #16] ldmia r3, {r0, r1} ; F stmia r4, {r0, r1} The obvious difference is the extra instructions X and Y, they save the value of tmp1 to stack before reusing the register. The simplified preprocessed source code is struct A { int x; int y; void f(const A &a, const A &b) { x = (a.x + b.x)>>1; y = (a.y + b.y)>>1; } }; class C { public: A p1; A p2; A p3; bool b; void g(C *, C *) const; }; void C::g(C *dst1, C *dst2) const { A tmp1, tmp2, tmp3; tmp1.f(p2,p1); tmp2.f(p2,p3); tmp3.f(tmp1, tmp2); dst1->p1 = p1; dst1->p2 = tmp1; dst1->p3 = dst2->p1 = tmp3; dst2->p2 = tmp2; dst2->p3 = p3; } The simplified command line is: ./cc1plus -fpreprocessed t.ii -quiet -dumpbase t.cpp -mthumb "-march=armv7-a" "-mtune=cortex-a15" -auxbase t -O2 -fno-omit-frame-pointer -o t.s It looks like the dse2 pass did wrong transformation. The gcc4.7 and trunk generate correct code.