[llvm-bugs] [Bug 144227] [SelectionDAG][x86] Shuffle pyramid is not eliminated

LLVM Bugs via llvm-bugs Sat, 14 Jun 2025 07:49:52 -0700

Issue	144227
Summary	[SelectionDAG][x86] Shuffle pyramid is not eliminated
Labels	new issue
Assignees
Reporter	sutajo

    https://godbolt.org/z/zEonMr7da

The above example illustrates the problem.


There are 2 minimum reductions in this snippet, but only the second is compiled into the efficient form:
```
vextracti128 xmm1, ymm0, 1
vpminuw xmm0, xmm0, xmm1
vphminposuw xmm0, xmm0
```

The first one is computed with shuffles and mins:
```
vextracti128    xmm2, ymm0, 1
vpminuw xmm2, xmm0, xmm2
vpshufd xmm3, xmm2, 238                 # xmm3 = xmm2[2,3,2,3]
vpminuw xmm3, xmm2, xmm3
vpshufd xmm4, xmm3, 85                  # xmm4 = xmm3[1,1,1,1]
vpminuw xmm3, xmm3, xmm4
vpsrld  xmm4, xmm3, 16
vphminposuw     xmm2, xmm2
```

Normally, `llvm.vector.reduce.umin.v16i16` is converted into a series of vector_shuffle and umin operations in the SelectionDAG.

```
Initial selection DAG: %bb.0 'test_reduce_v16i16_with_sharing:start'
SelectionDAG has 47 nodes:
  t0: ch,glue = EntryToken
    t2: i64,ch = CopyFromReg t0, Register:i64 %0
  t4: v16i16,ch = load<(load (s256))> t0, t2, undef:i64
    t9: v16i16 = vector_shuffle<8,9,10,11,12,13,14,15,u,u,u,u,u,u,u,u> t4, poison:v16i16
 t10: v16i16 = umin t4, t9
    t11: v16i16 = vector_shuffle<4,5,6,7,u,u,u,u,u,u,u,u,u,u,u,u> t10, poison:v16i16
  t12: v16i16 = umin t10, t11
    t13: v16i16 = vector_shuffle<2,3,u,u,u,u,u,u,u,u,u,u,u,u,u,u> t12, poison:v16i16
  t14: v16i16 = umin t12, t13
  t17: i32 = Constant<0>
      t15: v16i16 = vector_shuffle<1,u,u,u,u,u,u,u,u,u,u,u,u,u,u,u> t14, poison:v16i16
    t16: v16i16 = umin t14, t15
  t19: i16 = extract_vector_elt t16, Constant:i64<0>
  t21: v1i16 = insert_vector_elt poison:v1i16, t19, Constant:i64<0>
      t22: v16i16 = concat_vectors t21, t21, t21, t21, t21, t21, t21, t21, t21, t21, t21, t21, t21, t21, t21, t21
    t24: v16i1 = setcc t4, t22, seteq:ch
      t6: i64,ch = CopyFromReg t0, Register:i64 %1
 t7: v16i16,ch = load<(load (s256))> t0, t6, undef:i64
    t26: v16i16 = BUILD_VECTOR Constant:i16<-1>, Constant:i16<-1>, Constant:i16<-1>, Constant:i16<-1>, Constant:i16<-1>, Constant:i16<-1>, Constant:i16<-1>, Constant:i16<-1>, Constant:i16<-1>, Constant:i16<-1>, Constant:i16<-1>, Constant:i16<-1>, Constant:i16<-1>, Constant:i16<-1>, Constant:i16<-1>, Constant:i16<-1>
  t27: v16i16 = vselect t24, t7, t26
    t28: v16i16 = vector_shuffle<8,9,10,11,12,13,14,15,u,u,u,u,u,u,u,u> t27, poison:v16i16
 t29: v16i16 = umin t27, t28
    t30: v16i16 = vector_shuffle<4,5,6,7,u,u,u,u,u,u,u,u,u,u,u,u> t29, poison:v16i16
  t31: v16i16 = umin t29, t30
    t32: v16i16 = vector_shuffle<2,3,u,u,u,u,u,u,u,u,u,u,u,u,u,u> t31, poison:v16i16
  t33: v16i16 = umin t31, t32
  t38: i16,i16 = merge_values undef:i16, undef:i16
 t39: i16,i16 = merge_values t19, undef:i16
        t34: v16i16 = vector_shuffle<1,u,u,u,u,u,u,u,u,u,u,u,u,u,u,u> t33, poison:v16i16
 t35: v16i16 = umin t33, t34
    t36: i16 = extract_vector_elt t35, Constant:i64<0>
  t40: i16,i16 = merge_values t39, t36
  t43: ch,glue = CopyToReg t0, Register:i16 $ax, t40
  t45: ch,glue = CopyToReg t43, Register:i16 $dx, t40:1, t43:1
  t46: ch = X86ISD::RET_GLUE t45, TargetConstant:i32<0>, Register:i16 $ax, Register:i16 $dx, t45:1
```

In the x86 backend, the `combineExtractVectorElt` function is supposed detect binary operation reductions and replace these with a more efficient implementation. The problem is that `combineExtractVectorElt` only replaces the result of `extract_vector_elt` (in this case t19 and 36), but it does not replace t16 and t35.

During the combination steps, t22 is transformed to `vector_shuffle<0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0> t16, undef:v16i16`, so it depends on t16, and thus it keeps the reduction chain alive.

A possbile solution to this would be:
Let's say `N = extract_vector_elt(V,0)` and `V` comes from a reduction.
If `N` gets replaced with `M`, we could replace `V` with `scalar_to_vector(M)`.
This would eliminate all references to the inefficient reduction.
This is safe to do, since:
- All elements of V expect from the zeroth are known to be undefined
- M does not depend on V, so there will be no cycles.

_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

[llvm-bugs] [Bug 144227] [SelectionDAG][x86] Shuffle pyramid is not eliminated

Reply via email to