http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60888
Bug ID: 60888 Summary: x86 vector widen multiplication by constant is not replaced with shift and sub Product: gcc Version: 4.10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: evstupac at gmail dot com Created attachment 32631 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=32631&action=edit test case For the following test case: void foo(char *out, char *in) { int i; for(i = 0; i < 1024; i++) out[i] = (in[i] * 32767) >> 15; } compiled with: -O3 -m32 -msse2 -S -fdump-tree-vect-details Generates the following code at 114t.vect: vect_cst_.16_106 = { 32767, 32767, 32767, 32767, 32767, 32767, 32767, 32767 }; ... vect_patt_24.15_107 = WIDEN_MULT_LO_EXPR <vect__25.14_104, vect_cst_.16_106>; vect_patt_24.15_108 = WIDEN_MULT_HI_EXPR <vect__25.14_104, vect_cst_.16_106>; vect_patt_24.15_109 = WIDEN_MULT_LO_EXPR <vect__25.14_105, vect_cst_.16_106>; vect_patt_24.15_110 = WIDEN_MULT_HI_EXPR <vect__25.14_105, vect_cst_.16_106>; These 4 multiplications stay till final assembler: ... punpcklbw %xmm0, %xmm2 punpckhbw %xmm0, %xmm5 pmullw %xmm2, %xmm1 movdqa %xmm1, %xmm0 pmulhw %xmm3, %xmm2 ... However: out[i] = ((in[i] << 15) - in[i]) >> 15; is faster and calculating the same.