http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47271
Summary: gcc-4.6 -O1 -ftree-vectorize removes a test (if), the
function generates invalid outputs
Product: gcc
Version: 4.6.0
Status: UNCONFIRMED
Severity: critical
Priority: P3
Component: c
AssignedTo: [email protected]
ReportedBy: [email protected]
I tried to compile Python 3.2 (r87949) with gcc (version 4.6.0 20100908) on
AMD64: Python does fail with an assertion error or another strange crash. The
problem comes from a loop in Python/peephole.c. Compiled with -O1, it works
fine. Compiled with -O1 -ftree-vectorize, the functions generates strange
(invalid) outputs.
gcc-4.6 -O1:
-------------------------------------
0x0000000000480991 <+5041>: mov %eax,%edx
0x0000000000480993 <+5043>: sub %esi,%edx
0x0000000000480995 <+5045>: mov %edx,(%r12,%rax,4)
0x0000000000480999 <+5049>: movzbl 0x0(%rbp,%rax,1),%edx
0x000000000048099e <+5054>: cmp $0x9,%dl
0x00000000004809a1 <+5057>: jne 0x4809a8 <PyCode_Optimize+5064>
0x00000000004809a3 <+5059>: add $0x1,%esi
0x00000000004809a6 <+5062>: jmp 0x4809b2 <PyCode_Optimize+5074>
0x00000000004809a8 <+5064>: mov $0x3,%ecx
0x00000000004809ad <+5069>: cmp $0x59,%dl
0x00000000004809b0 <+5072>: ja 0x4809b7 <PyCode_Optimize+5079>
0x00000000004809b2 <+5074>: mov $0x1,%ecx
0x00000000004809b7 <+5079>: add %rcx,%rax
0x00000000004809ba <+5082>: cmp %rax,%rdi
0x00000000004809bd <+5085>: jg 0x480991 <PyCode_Optimize+5041>
-------------------------------------
gcc-4.6 -O1 -ftree-vectorize
-------------------------------------
0x0000000000480991 <+5041>: mov %eax,%ecx
0x0000000000480993 <+5043>: sub %edx,%ecx
0x0000000000480995 <+5045>: mov %ecx,(%r12,%rax,4)
0x0000000000480999 <+5049>: movzbl 0x0(%rbp,%rax,1),%ecx
0x000000000048099e <+5054>: lea 0x1(%rdx),%esi
0x00000000004809a1 <+5057>: cmp $0x9,%cl
0x00000000004809a4 <+5060>: cmovne %edx,%esi
0x00000000004809a7 <+5063>: cmove %esi,%edx
0x00000000004809aa <+5066>: setne %cl
0x00000000004809ad <+5069>: movzbl %cl,%ecx
0x00000000004809b0 <+5072>: lea 0x1(%rax,%rcx,2),%rax
0x00000000004809b5 <+5077>: cmp %rax,%rdi
0x00000000004809b8 <+5080>: jg 0x480991 <PyCode_Optimize+5041>
-------------------------------------
Extract of the correct output (-O1):
----
addrmap[0]=0
addrmap[3]=3
addrmap[4]=4
addrmap[7]=7
addrmap[10]=10
addrmap[13]=13
addrmap[16]=16
addrmap[19]=19
addrmap[22]=22
addrmap[23]=22
----
With -O1 -ftree-vectorize, only addrmap[0] and addrmap[3] are correct:
----
addrmap[0]=0
addrmap[3]=3
addrmap[4]=0
addrmap[7]=32767
addrmap[10]=16777216
addrmap[13]=0
addrmap[16]=469314288
addrmap[19]=32767
addrmap[22]=469315151
addrmap[23]=32767
----
See also:
http://bugs.python.org/issue9880
My setup:
* Intel(R) Pentium(R) 4 CPU 3.00GHz
* Debian Sid
* gcc (Debian 20110106-1) 4.6.0 20110106 (experimental) [trunk revision
168538]
* Python 3.2 (r87949)