Issue 125950
Summary AMDGPU: can optimize away v_readfirstlane_b32 on implicit_def input
Labels
Assignees
Reporter rampitec
    For the small testcase:
```
; RUN: llc -march=amdgcn -mcpu=gfx900 < %s

; Only one readfistlane is really needed

define amdgpu_ps void @test(i32 %in, ptr %out) {
  %v1 = insertelement <2 x i32> <i32 poison, i32 poison>, i32 %in, i32 0
  %v2 = bitcast <2 x i32> %v1 to i64
  %v3 = call i64 @llvm.amdgcn.s.quadmask.i64(i64 %v2)
  %p = inttoptr i64 %v2 to ptr addrspace(4)
  store i64 %v3, ptr %out
  ret void
}
```

We are producing code which has 2 v_readfirstlane_b32 instructions. Only one is needed as second one reads undef:

```
        v_readfirstlane_b32 s0, v0
        v_readfirstlane_b32 s1, v1
        s_quadmask_b64 s[0:1], s[0:1]
        v_mov_b32_e32 v4, s1
        v_mov_b32_e32 v3, s0
 flat_store_dwordx2 v[1:2], v[3:4]
        s_endpgm
```
Although backend does not see it is undef behind the REG_SEQUENCE.
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to