[llvm-bugs] [Bug 31283] New: Terrible ARM shuffle lowering for extend from <2 x i8> to <8 x i8>

via llvm-bugs Mon, 05 Dec 2016 16:22:20 -0800

https://llvm.org/bugs/show_bug.cgi?id=31283


            Bug ID: 31283
           Summary: Terrible ARM shuffle lowering for extend from <2 x i8>
                    to <8 x i8>
           Product: libraries
           Version: trunk
          Hardware: PC
                OS: Windows NT
            Status: NEW
          Severity: normal
          Priority: P
         Component: Backend: ARM
          Assignee: unassignedb...@nondot.org
          Reporter: efrie...@codeaurora.org
                CC: llvm-bugs@lists.llvm.org
    Classification: Unclassified

Testcase:

define <8 x i8> @f(<2 x i8> *%p) {
  %t = load <2 x i8>, <2 x i8> *%p
  %r = shufflevector <2 x i8> %t, <2 x i8> undef, <8 x i32> <i32 0, i32 1, i32
2, i32 2, i32 2, i32 2, i32 2, i32 2>
  ret <8 x i8> %r
}

With llc -mtriple=armv7--linux-gnueabihf:

        vld1.16 {d16[0]}, [r0:16]
        vldr    d18, .LCPI0_0
        vmovl.u8        q8, d16
        vmovl.u16       q8, d16
        vtbl.8  d0, {d16}, d18
        bx      lr

The first instruction is great... vld1.16 produces exactly the result we want. 
The problem is the following four instructions, which add up to an identity
shuffle.  I think what's happening is that we treat vld1.16+vmovl.u8+vmovl.u16
as a single, legal DAG node ("load<LD2[%p], anyext from v2i8>"), so shuffle
combining never tries to eliminate the extra shuffles.  Excerpts from
SelectionDAG debug output:


Optimized type-legalized selection DAG: BB#0 'f:'
SelectionDAG has 14 nodes:
  t0: ch = EntryToken
        t18: i32 = extract_vector_elt t16, Constant:i32<0>
        t21: i32 = extract_vector_elt t16, Constant:i32<1>
      t24: v8i8 = BUILD_VECTOR t18, t21, undef:i32, undef:i32, undef:i32,
undef:i32, undef:i32, undef:i32
    t9: f64 = bitcast t24
  t11: ch,glue = CopyToReg t0, Register:f64 %D0, t9
    t2: i32,ch = CopyFromReg t0, Register:i32 %vreg0
  t16: v2i32,ch = load<LD2[%p], anyext from v2i8> t0, t2, undef:i32
  t12: ch = ARMISD::RET_FLAG t11, Register:f64 %D0, t11:1


Legalized selection DAG: BB#0 'f:'
SelectionDAG has 15 nodes:
  t0: ch = EntryToken
            t2: i32,ch = CopyFromReg t0, Register:i32 %vreg0
          t16: v2i32,ch = load<LD2[%p], anyext from v2i8> t0, t2, undef:i32
        t25: v8i8 = bitcast t16
            t38: i32 = ARMISD::Wrapper TargetConstantPool:i32<<8 x i8> <i8 0,
i8 4, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1>> 0
          t35: f64,ch = load<LD8[ConstantPool]> t0, t38, undef:i32
        t36: v8i8 = bitcast t35
      t32: v8i8 = ARMISD::VTBL1 t25, t36
    t9: f64 = bitcast t32
  t11: ch,glue = CopyToReg t0, Register:f64 %D0, t9
  t12: ch = ARMISD::RET_FLAG t11, Register:f64 %D0, t11:1

-- 
You are receiving this mail because:
You are on the CC list for the bug.

_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

[llvm-bugs] [Bug 31283] New: Terrible ARM shuffle lowering for extend from <2 x i8> to <8 x i8>

Reply via email to