https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71488
Uroš Bizjak <ubizjak at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |NEW
Last reconfirmed| |2016-06-13
Component|target |middle-end
Target Milestone|--- |7.0
Summary|Wrong code on GCC trunk |[6/7 Regression] Wrong code
|with ivybridge and westmere |for vector comparisons with
|targets |ivybridge and westmere
| |targets
Ever confirmed|0 |1
--- Comment #1 from Uroš Bizjak <ubizjak at gmail dot com> ---
Following minimized case will show the problem:
--cut here--
int var_4 = 1;
long long var_9 = 0;
int main() {
std::valarray<std::valarray<long long>> v10;
v10.resize(1);
v10[0].resize(4);
for (int i = 0; i < 4; i++)
v10[0][i] = ((var_9 == 0) > unsigned (var_4 == 0)) + (var_9 == 0);
std::cout << v10[0][0] << "\n";
}
--cut here--
This test should be compiled with "-std=c++11 -O3 -march=westmere" to obtain
wrong result:
$ ./a.out
1
The correct result can be obtained by adding -fno-tree-vectorize to compile
flags:
./a.out
2
Looking at the asm dump, the problematic loop is:
.L22:
movddup var_9(%rip), %xmm0
pxor %xmm1, %xmm1
(1) pcmpeqq %xmm1, %xmm0
salq $63, %rax
movdqa .LC0(%rip), %xmm2
sarq $63, %rax
movq %rax, %xmm1
(2) movdqa %xmm0, %xmm3
punpcklqdq %xmm1, %xmm1
pand %xmm2, %xmm0
shufps $136, %xmm0, %xmm0
(3) pcmpgtq %xmm1, %xmm3
movdqa %xmm3, %xmm1
pand %xmm2, %xmm1
shufps $136, %xmm1, %xmm1
paddd %xmm1, %xmm0
pmovsxdq %xmm0, %xmm1
psrldq $8, %xmm0
pmovsxdq %xmm0, %xmm0
movups %xmm1, (%rdx)
movups %xmm0, 16(%rdx)
At insn (1), vector (0xf...f,0xf...f) is generated as a result of comparison of
vector (var_9,var_9) with vector (0,0). However, this result goes through insn
(2) directly to insn (3) as its input argument. This is certainly wrong, the
result of the comparison should be masked with (0x0...1,0x0...1).
The problem already exists at RTL expand time. The corresponding insn sequence
is:
;; mask__3.59_48 = vect_cst__51 == { 0, 0 };
(insn 117 116 118 (set (reg:V2DI 179)
(vec_duplicate:V2DI (reg:DI 108 [ var_9.0_50 ]))) crash.cpp:29 4210
{*vec_dupv2di}
(nil))
(insn 118 117 119 (set (reg:V2DI 180)
(const_vector:V2DI [
(const_int 0 [0])
(const_int 0 [0])
])) crash.cpp:29 -1
(nil))
(insn 119 118 120 (set (reg:V2DI 181)
(eq:V2DI (reg:V2DI 179)
(reg:V2DI 180))) crash.cpp:29 -1
(nil))
(insn 120 119 0 (set (reg:V2DI 106 [ mask__3.59 ])
(reg:V2DI 181)) crash.cpp:29 -1
(nil))
;; vect_patt_111.61_79 = VEC_COND_EXPR <mask__3.59_48 > vect_cst__63, { 1, 1 },
{ 0, 0 }>;
(insn 121 120 122 (set (reg:V2DI 182)
(vec_duplicate:V2DI (reg:DI 117 [ _64 ]))) 4210 {*vec_dupv2di}
(nil))
(insn 122 121 123 (set (reg:V2DI 183)
(mem/u/c:V2DI (symbol_ref/u:DI ("*.LC0") [flags 0x2]) [5 S16 A128]))
-1
(expr_list:REG_EQUAL (const_vector:V2DI [
(const_int 1 [0x1])
(const_int 1 [0x1])
])
(nil)))
(insn 123 122 124 (set (reg:V2DI 184)
(gt:V2DI (reg:V2DI 106 [ mask__3.59 ])
(reg:V2DI 182))) -1
(nil))
(insn 124 123 0 (set (reg:V2DI 119 [ vect_patt_111.61 ])
(and:V2DI (reg:V2DI 184)
(reg:V2DI 183))) -1
(nil))
Please note how the result of comparison from (insn 119) enters directly a
foolow up comparison (insn 123). It looks to me that (insn 120) needs to be AND
insn, as is the case with comparison (insn 123) and its corresponding (insn
124).
Confirmed as a middle-end problem.