[llvm-bugs] [Bug 139908] [AMDGPU] SIFoldOperands incorrectly folding constant splat reg sequence

LLVM Bugs via llvm-bugs Wed, 14 May 2025 07:28:06 -0700

Issue	139908
Summary	[AMDGPU] SIFoldOperands incorrectly folding constant splat reg sequence
Labels	new issue
Assignees
Reporter	npmiller

    ## Description

The fold operands pass tries to fold constants into their uses even folding register sequences.


In some cases it does it when it shouldn't, for example see the following MIR:
```
 %14:sreg_32 = S_MOV_B32 42
    %15:sreg_64 = REG_SEQUENCE %14, %subreg.sub0, %14, %subreg.sub1
    %16:sreg_64 = V_CMP_EQ_U64_e64 killed %21, %15, implicit $exec
```

Is getting folded into this:
```
 %14:sreg_32 = S_MOV_B32 42
    %15:sreg_64 = REG_SEQUENCE %14, %subreg.sub0, %14, %subreg.sub1
    %16:sreg_64 = V_CMP_EQ_U64_e64 killed %21, 42, implicit $exec
```

As you can see the 42 was moved over the register sequence and directly into the comparison instruction. This is incorrect because now the comparison will only check against the value 42, instead of a 64 bit splat of two 42s.

There are some constant values for which this is valid, for example `-1`, or `0` would be fine, since the splat is essentially just extending them from 32 bits to 64 bits, which the immediate operand can handle. But this is not correct in the general case.

Some lit tests are checking the splats of `-1` get folded by the pass (for example `test/CodeGen/AMDGPU/GlobalISel/andn2.ll`).

This folding is handled in [`SIFoldOperands.cpp:919`](https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp#L919).

If the register sequence has more than one register it does the following check:
```cpp
      if (!TII->isInlineConstant(*Op, OpTy) ||
 !TII->isOperandLegal(*UseMI, UseOpIdx, Op))
        return false;
```

This only ensures that the specific value can be folded as an immediate, it doesn't check if the composition of the values in the register sequence can be represented as an immediate.

Additionally the folding is only performed for splats:
```cpp
    if (Imm != SubImm)
      return false; // Can only fold splat constants
```

So clearly this code needs additional checks to avoid invalid folds, but I'm not sure exactly what they should be. It's possible some instructions are able to handle these folded splats, but it's definitely being done for some instructions that can't.

Note: `isInlineConstant` checks that the value is between `-16` and `64`, so a larger constant value in the splat doesn't trigger the issue.

Note: this was discovered downstream in [intel/llvm](https://github.com/intel/llvm), during SYCL testing.

## Reproducer

```
; ModuleID = 'repro.bc'
target datalayout = "e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-p7:160:256:256:32-p8:128:128-p9:192:256:256:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5-G1-ni:7:8:9"
target triple = "amdgcn-amd-amdhsa"

define i32 @repro_kernel(i64 %in) {
entry:
  %.not = icmp eq i64 %in, 180388626474
  br i1 %.not, label %2, label %1
1:
  ret i32 1

2:
  ret i32 2
}
```

Incorrect assembly snippet from that IR file:
```
$ ./bin/llc repro.ll -o repro.s
$ cat repro.s
[...]
repro_kernel:                           ; @repro_kernel
; %bb.0:                                ; %entry
        s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) 
        v_cmp_eq_u64_e32 vcc, 42, v[0:1]
 v_mov_b32_e32 v0, 1
        s_and_saveexec_b64 s[4:5], vcc
 s_cbranch_execz .LBB0_2 
; %bb.1:                           
        v_mov_b32_e32 v0, 2 
.LBB0_2:                                ; %UnifiedReturnBlock
 s_or_b64 exec, exec, s[4:5]
        s_setpc_b64 s[30:31]
[...]
```
MIR snippet before the fold operands pass:
```
$ ./bin/llc repro.ll -stop-before=si-fold-operands -o before.mir
$ cat before.mir
[...]
body: |
  bb.0.entry:
    successors: %bb.1(0x40000000), %bb.2(0x40000000)
    liveins: $vgpr0, $vgpr1
  
    %11:vgpr_32 = COPY $vgpr1
    %10:vgpr_32 = COPY $vgpr0
    %21:vreg_64 = REG_SEQUENCE %10, %subreg.sub0, %11, %subreg.sub1
    %14:sreg_32 = S_MOV_B32 42
 %15:sreg_64 = REG_SEQUENCE %14, %subreg.sub0, %14, %subreg.sub1
 %16:sreg_64 = V_CMP_EQ_U64_e64 killed %21, %15, implicit $exec
 %12:vgpr_32 = V_MOV_B32_e32 1, implicit $exec
    %0:sreg_64 = SI_IF killed %16, %bb.2, implicit-def dead $exec, implicit-def dead $scc, implicit $exec
 S_BRANCH %bb.1
  
  bb.1 (%ir-block.3):
    successors: %bb.2(0x80000000)
  
    %18:vgpr_32 = V_MOV_B32_e32 2, implicit $exec
 
  bb.2.UnifiedReturnBlock:
    %1:vgpr_32 = PHI %12, %bb.0, %18, %bb.1
 SI_END_CF %0, implicit-def dead $exec, implicit-def dead $scc, implicit $exec
    $vgpr0 = COPY %1
    SI_RETURN implicit $vgpr0
[...]
```
MIR snippet after running the fold operands pass:
```
$ ./bin/llc before.mir --run-pass=si-fold-operands -o bad.mir
$ cat bad.mir
[...]
body: |
  bb.0.entry:
    successors: %bb.1(0x40000000), %bb.2(0x40000000)
 liveins: $vgpr0, $vgpr1
  
    %11:vgpr_32 = COPY $vgpr1
    %10:vgpr_32 = COPY $vgpr0
    %21:vreg_64 = REG_SEQUENCE %10, %subreg.sub0, %11, %subreg.sub1
    %14:sreg_32 = S_MOV_B32 42
    %15:sreg_64 = REG_SEQUENCE %14, %subreg.sub0, %14, %subreg.sub1
    %16:sreg_64 = V_CMP_EQ_U64_e64 killed %21, 42, implicit $exec
    %12:vgpr_32 = V_MOV_B32_e32 1, implicit $exec
    %0:sreg_64 = SI_IF killed %16, %bb.2, implicit-def dead $exec, implicit-def dead $scc, implicit $exec
    S_BRANCH %bb.1
  
  bb.1 (%ir-block.3):
    successors: %bb.2(0x80000000)
  
    %18:vgpr_32 = V_MOV_B32_e32 2, implicit $exec
  
  bb.2.UnifiedReturnBlock:
 %1:vgpr_32 = PHI %12, %bb.0, %18, %bb.1
    SI_END_CF %0, implicit-def dead $exec, implicit-def dead $scc, implicit $exec
    $vgpr0 = COPY %1
 SI_RETURN implicit $vgpr0
[...]
```

_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

[llvm-bugs] [Bug 139908] [AMDGPU] SIFoldOperands incorrectly folding constant splat reg sequence

Reply via email to