https://llvm.org/bugs/show_bug.cgi?id=28845
Bug ID: 28845 Summary: Incorrect codegen for "store <2 x i48>" triggered by -fslp-vectorize-aggressive Product: new-bugs Version: trunk Hardware: PC OS: All Status: NEW Severity: normal Priority: P Component: new bugs Assignee: unassignedb...@nondot.org Reporter: babo...@gmail.com CC: elena.demikhov...@intel.com, llvm-bugs@lists.llvm.org, vsevolod.livins...@frtk.ru Classification: Unclassified Created attachment 16882 --> https://llvm.org/bugs/attachment.cgi?id=16882&action=edit reproducer Attached test case has a structure with a number of bit fields (discard the weirdness of C defintion of the structure, it's not important, while LLVM IR defintion is important). LLVM IR structure defintion: %struct.struct_1 = type { [6 x i8], [6 x i8], i24 } Initialization happens by read-modity-write of two 48 bit chunks, no magic here. define void @_Z4initv() local_unnamed_addr #0 { entry: %bf.load = load i48, i48* bitcast (%struct.struct_1* @s1 to i48*), align 8 %bf.clear = and i48 %bf.load, -8796091973633 %bf.set3 = or i48 %bf.clear, 7326889148416 store i48 %bf.set3, i48* bitcast (%struct.struct_1* @s1 to i48*), align 8 %bf.load4 = load i48, i48* bitcast ([6 x i8]* getelementptr inbounds (%struct.struct_1, %struct.struct_1* @s1, i64 0, i32 1) to i48*), align 2 %bf.clear5 = and i48 %bf.load4, -2198956146689 %bf.set6 = or i48 %bf.clear5, 822419128320 store i48 %bf.set6, i48* bitcast ([6 x i8]* getelementptr inbounds (%struct.struct_1, %struct.struct_1* @s1, i64 0, i32 1) to i48*), align 2 ret void } But when test case is compiler with -fslp-vectorize-aggressive, this is optimized to vector operations: define void @_Z4initv() local_unnamed_addr #0 { entry: %bf.load = load <2 x i48>, <2 x i48>* bitcast (%struct.struct_1* @s1 to <2 x i48>*), align 8 %bf.clear = and <2 x i48> %bf.load, <i48 -8796091973633, i48 -2198956146689> %bf.set3 = or <2 x i48> %bf.clear, <i48 7326889148416, i48 822419128320> store <2 x i48> %bf.set3, <2 x i48>* bitcast (%struct.struct_1* @s1 to <2 x i48>*), align 8 ret void } This seems legal, but it leads to incorrect code generation. More specifically, instead of two *consequent* 48 bit stores, stores happen with 16 bit gap. Good: movl %ecx, s1(%rip) movw %cx, s1+4(%rip) movl %ecx, s1+6(%rip) movw %cx, s1+10(%rip) Bad: movq %xmm1, s1(%rip) movq %xmm0, s1+8(%rip) So the problem is in different meaning of <2 x i48> - in LLVM IR consequent 48 bit locations are assumed, while code generation assumes 64 bit alignment. My guess that it's code gen bug. To reproduce: >clang++ func.cpp test.cpp -o out -fslp-vectorize-aggressive -O2 -- You are receiving this mail because: You are on the CC list for the bug.
_______________________________________________ llvm-bugs mailing list llvm-bugs@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs