http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55073
Bug #: 55073 Summary: Wrong Neon code generation at -O2 caused by -fschedule-insns Classification: Unclassified Product: gcc Version: 4.8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target AssignedTo: unassig...@gcc.gnu.org ReportedBy: eric.ba...@allegorithmic.com Created attachment 28528 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=28528 Zipfile with repro case, build script, disassembly listings and register flow analysis Using gcc trunk at rev 192800, compiled with the Android NDK's build-gcc.sh script (arm-linux-androideabi target). Compiling the attached repro case at -O2 yields incorrect results. Correct results are generated for -O2 -fno-schedule-insns. The command line to build an incorrect program is : arm-linux-androideabi-g++ -mandroid -march=armv7-a -mfloat-abi=softfp -mfpu=vfp -mfpu=neon -fpic -marm -O2 -fno-strict-aliasing -Wall -o repro_ko repro.cpp The command line to build a correct program is : arm-linux-androideabi-g++ -mandroid -march=armv7-a -mfloat-abi=softfp -mfpu=vfp -mfpu=neon -fpic -marm -O2 -fno-schedule-insns -fno-strict-aliasing -Wall -o repro_ok repro.cpp I am aware that the test case is quite convoluted but this is because we use some kind of "universal" 128b vector type that autoconverts to and from other Neon types (not all ARM compilers have -flax-vector-conversions). Still, both program should output the same results. The body of the failing function is pasted below (prolog and epilog omitted): Correct code (-O2 -fno-schedule-insns): vmov d19, d20 @ v8qi vmov d21, d18 @ v8qi vmov d20, d19 @ v8qi vzip.8 d19, d18 vzip.8 d21, d20 vswp d18, d19 vswp d20, d21 vmov d21, d19 @ v8qi vmov d19, d20 @ v8qi vzip.8 d21, d20 vzip.8 d19, d18 vswp d20, d21 vswp d18, d19 vmovl.s8 q10, d21 vmovl.s8 q9, d19 vsub.i16 q9, q9, q8 vsub.i16 q8, q10, q8 vadd.i16 q8, q9, q8 vst1.64 {d16-d17}, [r0:128] Incorrect code (-O2): vmov d19, d20 @ v8qi vmov d22, d18 @ v8qi vmov d21, d20 @ v8qi vzip.8 d19, d18 vzip.8 d22, d21 vswp d18, d19 vmov d20, d22 @ v8qi vmov d21, d18 @ v8qi vzip.8 d22, d19 vzip.8 d21, d20 vmovl.s8 q9, d22 vswp d20, d21 vsub.i16 q9, q9, q8 vmovl.s8 q10, d21 vsub.i16 q8, q10, q8 vadd.i16 q8, q9, q8 vst1.64 {d16-d17}, [r0:128] I have attached a build.sh script that builds the two versions (OK and KO) of the output programs. These programs need to be run on any Android ARMV7 target. This probably happens with linux builds of gcc as well. I did some register flow tracing to give formal expressions of what ends up in the return value (well, just before the vsub/vsub/vadd actually). This is in the attached bug_gcc.txt file (which should be read with hard tabs, tab length set to 30 or something in order for the formatting to work). I don't know if this is related to bug 54300 (which by the way is still "unconfirmed" although I confirmed it occurring even with -fno-strict-aliasing, do I need to provide more info on this one?)