https://llvm.org/bugs/show_bug.cgi?id=26110
Ahmed Bougacha <ahmed.bouga...@gmail.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |REOPENED CC| |ahmed.bouga...@gmail.com Component|LLVM Codegen |Backend: X86 Version|3.7 |trunk Resolution|FIXED |--- Assignee|unassignedclangbugs@nondot. |unassignedb...@nondot.org |org | Product|clang |libraries --- Comment #3 from Ahmed Bougacha <ahmed.bouga...@gmail.com> --- So; I looked a little closer. Sanjay's bisect was correct. clang-700 is pretty old now; I bisected to: r229099 [SimplifyCFG] Be more aggressive Sure enough, this still reproduces on trunk with -mllvm -phi-node-folding-threshold=1. Long story short: the problematic pattern is: (c ? -v : v) which we lower to (because "c" is <4 x i1>, lowered as a vector mask): (~c & v) | (c & -v) roughly corresponding to this IR: define <4 x i32> @t(<4 x i32> %v, <4 x i32> %c) { %cl = shl <4 x i32> %c, <i32 31, i32 31, i32 31, i32 31> %cs = ashr <4 x i32> %c, <i32 31, i32 31, i32 31, i32 31> %tmp2 = trunc <4 x i32> %cs to <4 x i1> ; ^ not as artificial as it looks: equivalent to a legalized vsetcc %mv = sub nsw <4 x i32> zeroinitializer, %v %r = select <4 x i1> %tmp2, <4 x i32> %v, <4 x i32> %mv ret <4 x i32> %r } The SSE2 codegen is pretty straightforward: xorps %xmm1, %xmm1 ... # xmm6 <- %v ... # xmm3 <- %c psubd %xmm6, %xmm1 # 0 - v # 0 - 5 -> -5 movaps %xmm3, %xmm0 # c # 0 -> 0 pandn %xmm6, %xmm0 # ~c & v # ~0 & 5 -> 5 pand %xmm3, %xmm1 # c & -v # -5 & 0 -> 0 por %xmm0, %xmm1 # (~c & v) | (c & -v) # 0 | 5 -> 5 However when we have SSSE3 (the default on OS X), we try to match it to PSIGND, instead doing: psignd %xmm3, %xmm1 # (c < 0 ? -v : (c > 0 ? v : 0)) # c is a mask, so (c > 0) == 0 # (c ? -v : 0) # (0 ? -5 : 0) # -> 0 Which is not equivalent; one does: (c ? -v : 0) the other: (c ? -v : v) Now. This bug existed since 2010. However, I think we don't know about this issue because of operand canonicalization. The PSIGN combine matches: (or (and m, x), (pandn m, (0 - x))) (or (and x, m), (pandn m, (0 - x))) (or (pandn m, (0 - x)), (and m, x)) (or (pandn m, (0 - x)), (and x, m)) but not the variants of: (or (and m, (0 - x)), (pandn m, x)) Which is what gets generated for the function above (the most obvious IR that I could write). I think this is pretty easy to fix: instead of using c as a mask, put any non-sign bit in there, to default to the 'v' case. So, this should work: por <1,1,1,1>, %xmm3 # c' = c | 1 psignd %xmm3, %xmm1 # (c' < 0 ? -v : (c' > 0 ? v : 0)) # c is a mask, so c' is either 1 or 0xff..f # (c' == 0xff..f ? -v : (c' != 0 ? v : v)) # (c' == 0xff..f ? -v : v) # (0 ? -5 : 5) # -> 5 CP loads are cheap, so this is probably still a win over the SSE2 codegen: psrad $31, %xmm1 pxor %xmm2, %xmm2 psubd %xmm0, %xmm2 pand %xmm1, %xmm2 pandn %xmm0, %xmm1 por %xmm1, %xmm2 movdqa %xmm2, %xmm0 Note that I don't think the couple of PSIGN tests in trunk are correct either. Consider test/CodeGen/X86/vec-sign.ll: define <4 x i32> @signd(<4 x i32> %a, <4 x i32> %b) nounwind { entry: %b.lobit = ashr <4 x i32> %b, <i32 31, i32 31, i32 31, i32 31> %sub = sub nsw <4 x i32> zeroinitializer, %a %0 = xor <4 x i32> %b.lobit, <i32 -1, i32 -1, i32 -1, i32 -1> %1 = and <4 x i32> %a, %0 %2 = and <4 x i32> %b.lobit, %sub %cond = or <4 x i32> %1, %2 ret <4 x i32> %cond } if %b is zero: %b.lobit = <4 x i32> zeroinitializer %sub = sub nsw <4 x i32> zeroinitializer, %a %0 = <4 x i32> <i32 -1, i32 -1, i32 -1, i32 -1> %1 = <4 x i32> %a %2 = <4 x i32> zeroinitializer %cond = or <4 x i32> %1, %2 ret <4 x i32> %a } whereas we currently generate: psignd %xmm1, %xmm0 retq which return 0, as %xmm1 is 0. -- You are receiving this mail because: You are on the CC list for the bug.
_______________________________________________ llvm-bugs mailing list llvm-bugs@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs