https://llvm.org/bugs/show_bug.cgi?id=31492
Bug ID: 31492 Summary: [PPC] slower vsx instructions generated for vmac Product: libraries Version: trunk Hardware: PC OS: Linux Status: NEW Severity: normal Priority: P Component: Backend: PowerPC Assignee: unassignedb...@nondot.org Reporter: car...@google.com CC: llvm-bugs@lists.llvm.org Classification: Unclassified Created attachment 17789 --> https://llvm.org/bugs/attachment.cgi?id=17789&action=edit testcase The attached test case is simplified from vmac. Compile it with options -m64 -O2 -mvsx -mcpu=power8 LLVM generates following code for the while loop .LBB0_2: # %while.body # =>This Inner Loop Header: Depth=1 lxvd2x 0, 0, 7 // * lxvd2x 1, 0, 6 // * xxswapd 34, 0 // * xxswapd 35, 1 // * vaddudm 2, 3, 2 // * xxswapd 10, 34 // * mfvsrd 9, 34 // * mfvsrd 10, 10 // * #APP mulhdu 11, 10, 9 #NO_APP lxvd2x 11, 7, 8 lxvd2x 12, 0, 5 mulld 9, 9, 10 addi 7, 7, 64 xxswapd 50, 11 xxswapd 51, 12 vaddudm 2, 19, 18 xxswapd 13, 34 mfvsrd 0, 34 mfvsrd 12, 13 mulld 10, 0, 12 #APP mulhdu 12, 12, 0 #NO_APP #APP addc 3, 9, 10 adde 4, 11, 12 #NO_APP bdnz .LBB0_2 There are two problems: 1. (kp)[i] is loop invariant, its loading can be hoisted before the loop. 2. llvm generates vsx code marked * for the expression get64PE((mp) + i) + (kp)[i] if we use simple integer load and add instructions, it will be shorter and faster. For large input, it can be 35% faster. Looks like cost model problem in vectorization? -- You are receiving this mail because: You are on the CC list for the bug.
_______________________________________________ llvm-bugs mailing list llvm-bugs@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs