Issue 132245
Summary [RISC] Unprofitable zext icmp combine on vectors
Labels backend:RISC-V
Assignees
Reporter lukel97
    InstCombine will combine this zext of an icmp where the source has a single bit set to a lshr plus trunc:

```llvm
define <vscale x 1 x i8> @f(<vscale x 1 x i64> %x) {
  %1 = and <vscale x 1 x i64> %x, splat (i64 8)
  %2 = icmp ne <vscale x 1 x i64> %1, splat (i64 0)
  %3 = zext <vscale x 1 x i1> %2 to <vscale x 1 x i8>
  ret <vscale x 1 x i8> %3
}
```

```llvm
define <vscale x 1 x i8> @f(<vscale x 1 x i64> %x) #0 {
  %1 = and <vscale x 1 x i64> %x, splat (i64 8)
  %.lobit = lshr exact <vscale x 1 x i64> %1, splat (i64 3)
  %2 = trunc nuw nsw <vscale x 1 x i64> %.lobit to <vscale x 1 x i8>
  ret <vscale x 1 x i8> %2
}
```

In a loop, this ends up being unprofitable for RISC-V because the codegen now goes from:

```asm
f: # @f
	.cfi_startproc
# %bb.0:
	vsetvli	a0, zero, e64, m1, ta, ma
	vand.vi	v8, v8, 8
	vmsne.vi	v0, v8, 0
	vsetvli	zero, zero, e8, mf8, ta, ma
	vmv.v.i	v8, 0
	vmerge.vim	v8, v8, 1, v0
	ret
```

To a series of narrowing vnsrl.wis:

```asm
f:                                      # @f
	.cfi_startproc
# %bb.0:
	vsetvli	a0, zero, e64, m1, ta, ma
	vand.vi	v8, v8, 8
	vsetvli	zero, zero, e32, mf2, ta, ma
	vnsrl.wi	v8, v8, 3
	vsetvli	zero, zero, e16, mf4, ta, ma
	vnsrl.wi	v8, v8, 0
	vsetvli	zero, zero, e8, mf8, ta, ma
	vnsrl.wi	v8, v8, 0
	ret
```

In the original form, the vmv.v.i is loop invariant and is hoisted out, and the vmerge.vim usually gets folded away into a masked instruction, so you usually just end up with a vsetvli + vmsne.vi.

The truncate requires multiple instructions and introduces a vtype toggle for each one, and is measurably slower on the BPI-F3.

I think we have enough information to reverse the combine somewhere.

I noticed this while working on #132180 when trying to widen AnyOf reductions.
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to