https://llvm.org/bugs/show_bug.cgi?id=29114
Bug ID: 29114 Summary: ext_vector_type(3) generates scalar divides Product: clang Version: 3.9 Hardware: PC OS: All Status: NEW Severity: normal Priority: P Component: LLVM Codegen Assignee: unassignedclangb...@nondot.org Reporter: t...@moomalade.com CC: llvm-bugs@lists.llvm.org Classification: Unclassified Created attachment 17034 --> https://llvm.org/bugs/attachment.cgi?id=17034&action=edit Example program showing good and bad vector div generation Performing a scalar divides on an ext_vector_type(3) produces 3 div instructions, rather than one. Addition, multiplication, subtraction all produce one instruction. This can be worked around by converting the value to an ext_vector_type(4), dividing, then converting back to ext_vector_type(3). Reproduces on Windows LLVM 3.9 and 4.0 snapshot and OSX with XCode clang 7.3.0. Example: This code will generate 3 divide instructions. typedef float vec3 __attribute__((ext_vector_type(3))); vec3 vec3_divide_bad(vec3 v, float d) { return v / d; } /* a.out`vec3_divide_bad: a.out[0x100000f60] <+0>: push rbp a.out[0x100000f61] <+1>: mov rbp, rsp a.out[0x100000f64] <+4>: movshdup xmm2, xmm0 ; xmm2 = xmm0[1,1,3,3] a.out[0x100000f68] <+8>: divss xmm2, xmm1 a.out[0x100000f6c] <+12>: movaps xmm3, xmm0 a.out[0x100000f6f] <+15>: shufpd xmm3, xmm3, 0x1 ; xmm3 = xmm3[1,0] a.out[0x100000f74] <+20>: divss xmm3, xmm1 a.out[0x100000f78] <+24>: divss xmm0, xmm1 a.out[0x100000f7c] <+28>: unpcklps xmm0, xmm3 ; xmm0 = xmm0[0],xmm3[0],xmm0[1],xmm3[1] a.out[0x100000f7f] <+31>: unpcklps xmm0, xmm2 ; xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1] a.out[0x100000f82] <+34>: pop rbp a.out[0x100000f83] <+35>: ret a.out[0x100000f84] <+36>: nop word ptr cs:[rax + rax] */ While this code will generate one divide instruction: typedef float vec3 __attribute__((ext_vector_type(3))); typedef float vec4 __attribute__((ext_vector_type(4))); vec3 vec3_divide_good(vec3 v, float d) { return ((vec4){v.x,v.y,v.z,0.0f} / d).xyz; } /* a.out`vec3_divide_good: a.out[0x100000f90] <+0>: push rbp a.out[0x100000f91] <+1>: mov rbp, rsp a.out[0x100000f94] <+4>: xorps xmm2, xmm2 a.out[0x100000f97] <+7>: shufps xmm2, xmm0, 0x23 ; xmm2 = xmm2[3,0],xmm0[2,0] a.out[0x100000f9b] <+11>: shufps xmm0, xmm2, 0x24 ; xmm0 = xmm0[0,1],xmm2[2,0] a.out[0x100000f9f] <+15>: shufps xmm1, xmm1, 0x0 ; xmm1 = xmm1[0,0,0,0] a.out[0x100000fa3] <+19>: divps xmm0, xmm1 a.out[0x100000fa6] <+22>: pop rbp a.out[0x100000fa7] <+23>: ret a.out[0x100000fa8] <+24>: nop dword ptr [rax + rax] */ -- You are receiving this mail because: You are on the CC list for the bug.
_______________________________________________ llvm-bugs mailing list llvm-bugs@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs