https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80862
Bug ID: 80862 Summary: [x86] Wrong rounding results for some test cases Product: gcc Version: 8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: sebastian.peryt at intel dot com CC: julia.koval at intel dot com, ubizjak at gmail dot com Target Milestone: --- Target: X86 Created attachment 41408 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=41408&action=edit Patch to reproduce described error. Recently I have found that rounding intrinsics for some particular cases produce wrong results. There have to be three specific conditions fulfilled to produce it: - test has to be compiled with O1 or O2 (doesn't appear on O0), - test case has to have only two intrinsics - regular (e.g. _mm512_cvtps_epi32) and round (e.g. _mm512_cvt_roundps_epi32), - both intrinsics must use the same input argument. As a result value from first (regular) intrinsic is copied to the second (round)intrinsic result. In asm output it can be seen that the same register is used for both assignments: vcvtps2dq %zmm0, %zmm1 vmovdqa64 %zmm1, -368(%rbp) pushq -312(%rbp) pushq -320(%rbp) pushq -328(%rbp) vcvtps2dq {rz-sae}, %zmm0, %zmm0 pushq -336(%rbp) vmovdqa64 %zmm1, -304(%rbp) >From what I gathered so far this is happening due to the use of parallel side effect for rounding md template in i386/subst.md. Because parallel is executing each side effect individually at first, on cse1 pass the part which is similar for both intrinsics get optimized. After that the same register is assigned for move operation in both assignments of the results and effectively regular and round intrinsic produces the same result. Probably some other side effect has to be used to set rounding flags to fix this issue, but I am not sure which one it should be. Eventually some modifications have to be made in cse.c to properly handle such use of parallel.