[Bug target/55073] New: Wrong Neon code generation at -O2 caused by -fschedule-insns

eric.batut at allegorithmic dot com Thu, 25 Oct 2012 05:54:37 -0700


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55073




             Bug #: 55073

           Summary: Wrong Neon code generation at -O2 caused by

                    -fschedule-insns

    Classification: Unclassified

           Product: gcc

           Version: 4.8.0

            Status: UNCONFIRMED

          Severity: normal

          Priority: P3

         Component: target

        AssignedTo: unassig...@gcc.gnu.org

        ReportedBy: eric.ba...@allegorithmic.com





Created attachment 28528

  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=28528

Zipfile with repro case, build script, disassembly listings and register flow

analysis



Using gcc trunk at rev 192800, compiled with the Android NDK's build-gcc.sh

script (arm-linux-androideabi target).



Compiling the attached repro case at -O2 yields incorrect results. Correct

results are generated for -O2 -fno-schedule-insns.



The command line to build an incorrect program is :

arm-linux-androideabi-g++ -mandroid -march=armv7-a -mfloat-abi=softfp -mfpu=vfp

-mfpu=neon -fpic -marm -O2 -fno-strict-aliasing -Wall -o repro_ko repro.cpp



The command line to build a correct program is :

arm-linux-androideabi-g++ -mandroid -march=armv7-a -mfloat-abi=softfp -mfpu=vfp

-mfpu=neon -fpic -marm -O2 -fno-schedule-insns -fno-strict-aliasing -Wall -o

repro_ok repro.cpp



I am aware that the test case is quite convoluted but this is because we use

some kind of "universal" 128b vector type that autoconverts to and from other

Neon types (not all ARM compilers have -flax-vector-conversions). Still, both

program should output the same results.



The body of the failing function is pasted below (prolog and epilog omitted):

Correct code (-O2 -fno-schedule-insns):

    vmov    d19, d20  @ v8qi

    vmov    d21, d18  @ v8qi

    vmov    d20, d19  @ v8qi

    vzip.8    d19, d18

    vzip.8    d21, d20

    vswp    d18, d19

    vswp    d20, d21

    vmov    d21, d19  @ v8qi

    vmov    d19, d20  @ v8qi

    vzip.8    d21, d20

    vzip.8    d19, d18

    vswp    d20, d21

    vswp    d18, d19

    vmovl.s8    q10, d21

    vmovl.s8    q9, d19

    vsub.i16    q9, q9, q8

    vsub.i16    q8, q10, q8

    vadd.i16    q8, q9, q8

    vst1.64    {d16-d17}, [r0:128]



Incorrect code (-O2):

    vmov    d19, d20  @ v8qi

    vmov    d22, d18  @ v8qi

    vmov    d21, d20  @ v8qi

    vzip.8    d19, d18

    vzip.8    d22, d21

    vswp    d18, d19

    vmov    d20, d22  @ v8qi

    vmov    d21, d18  @ v8qi

    vzip.8    d22, d19

    vzip.8    d21, d20

    vmovl.s8    q9, d22

    vswp    d20, d21

    vsub.i16    q9, q9, q8

    vmovl.s8    q10, d21

    vsub.i16    q8, q10, q8

    vadd.i16    q8, q9, q8

    vst1.64    {d16-d17}, [r0:128]



I have attached a build.sh script that builds the two versions (OK and KO) of

the output programs. These programs need to be run on any Android ARMV7 target.

This probably happens with linux builds of gcc as well.



I did some register flow tracing to give formal expressions of what ends up in

the return value (well, just before the vsub/vsub/vadd actually). This is in

the attached bug_gcc.txt file (which should be read with hard tabs, tab length

set to 30 or something in order for the formatting to work).



I don't know if this is related to bug 54300 (which by the way is still

"unconfirmed" although I confirmed it occurring even with -fno-strict-aliasing,

do I need to provide more info on this one?)

[Bug target/55073] New: Wrong Neon code generation at -O2 caused by -fschedule-insns

Reply via email to