Issue |
128390
|
Summary |
AMDGPU is missing simplify demanded bits optimizations of readfirstlane and similar operations
|
Labels |
good first issue,
backend:AMDGPU,
missed-optimization
|
Assignees |
|
Reporter |
arsenm
|
We are missing known bits and demanded bits optimizations which look through readfirstlane, readlane, and DPP operators. We need to insert extensions to produce a legal type, but these imply inserting instructions to appropriately set the high bits of the input value. These bits are never needed, and starting from the use context of the trunc, we should be able to delete it.
```
; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 < %s
target triple = "amdgcn-amd-amdhsa"
; v_and_b32_e32 v0, 0xff, v0 ; Should be able to delete this
; v_readfirstlane_b32 s4, v0
; v_mov_b32_e32 v0, s4
define i8 @readfirstlane_demanded_i8_zext(i8 %src) {
%zext = zext i8 %src to i32
%readfirstlane = call i32 @llvm.amdgcn.readfirstlane.i32(i32 %zext)
%trunc = trunc i32 %readfirstlane to i8
ret i8 %trunc
}
; v_bfe_i32 v0, v0, 0, 8 ; Should be able to delete this
; v_readfirstlane_b32 s4, v0
; v_mov_b32_e32 v0, s4
define i8 @readfirstlane_demanded_i8_sext(i8 %src) {
%zext = sext i8 %src to i32
%readfirstlane = call i32 @llvm.amdgcn.readfirstlane.i32(i32 %zext)
%trunc = trunc i32 %readfirstlane to i8
ret i8 %trunc
}
```
This should be accomplished by implementing SimplifyDemandedBitsForTargetNode in SITargetLowering. This should handle INTRINSIC_WO_CHAIN operations, and handle Intrinsic::amdgcn_readfirstlane as the base example
As an example these appear in the tests from #128388
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs