https://bugs.llvm.org/show_bug.cgi?id=45897
Bug ID: 45897
Summary: [X86][SSE] Improve combining to ISD::MULHS/MULHU
Product: libraries
Version: trunk
Hardware: PC
OS: Windows NT
Status: NEW
Severity: enhancement
Priority: P
Component: Backend: X86
Assignee: unassignedb...@nondot.org
Reporter: llvm-...@redking.me.uk
CC: craig.top...@gmail.com, llvm-bugs@lists.llvm.org,
llvm-...@redking.me.uk, spatel+l...@rotateright.com
This is a general bug/brain dump to cover a few areas that we might want to
consider to encourage use of the vXi16 ISD::MULHS/MULHU opcodes where possible
as they can give notable perf improvements over vXi32 multiples.
1 - Move the PPC combines from https://reviews.llvm.org/D78272 into DAGCombiner
if x86 can benefit as well.
2 - X86ISelLowering.cpp - combinePMULH currently performs:
vXi16 trunc(srl(mul({sz}ext(x),{sz}ext(y)),16)) -> vXi16 mulh{sz}(x,y)
But it might be beneficial to combine any x/y with sufficient leading sign/zero
bits - it depends on how cheap the truncation will be (PACKS/VTRUNC/nop/???)
compared to the penalty of the wider multiply/shift:
vXi16 trunc(srl(mul(x,y),16)) -> vXi16 mulh{sz}(trunc(x),trunc(y))
3 - Try to perform more truncation style combines even after combining to
PACKS/VTRUNC ops. We can probably still see the truncation pattern behind the
op, so maybe a SelectionDAG::simplifyTrunc() would be useful or a x86-only
combineTruncLike? Or at the very least try to match the SimplifyDemandedBits
calls we have for ISD::TRUNCATE.
4 - See if [Bug #38423] would still be useful - does
SimplifyDemandedBits/shuffle combining always simplify the vXi64 mul expansion
for us?
--
You are receiving this mail because:
You are on the CC list for the bug.
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs