[libcxxabi] [clang] [openmp] [libc] [clang-tools-extra] [mlir] [flang] [compiler-rt] [libcxx] [llvm] [AArch64] Combine store (trunc X to <3 x i8>) to sequence of ST1.b. (PR #78637)

Eli Friedman via cfe-commits Tue, 23 Jan 2024 11:17:50 -0800

================
@@ -281,23 +279,19 @@ entry:
 define void @store_trunc_add_from_64bits(ptr %src, ptr %dst) {
 ; CHECK-LABEL: store_trunc_add_from_64bits:
 ; CHECK:       ; %bb.0: ; %entry
-; CHECK-NEXT:    sub sp, sp, #16
-; CHECK-NEXT:    .cfi_def_cfa_offset 16
 ; CHECK-NEXT:    ldr s0, [x0]
 ; CHECK-NEXT:    add x9, x0, #4
 ; CHECK-NEXT:  Lloh0:
 ; CHECK-NEXT:    adrp x8, lCPI7_0@PAGE
 ; CHECK-NEXT:  Lloh1:
 ; CHECK-NEXT:    ldr d1, [x8, lCPI7_0@PAGEOFF]
+; CHECK-NEXT:    add x8, x1, #1
 ; CHECK-NEXT:    ld1.h { v0 }[2], [x9]
+; CHECK-NEXT:    add x9, x1, #2
 ; CHECK-NEXT:    add.4h v0, v0, v1
-; CHECK-NEXT:    xtn.8b v1, v0
-; CHECK-NEXT:    umov.h w8, v0[2]
-; CHECK-NEXT:    str s1, [sp, #12]
-; CHECK-NEXT:    ldrh w9, [sp, #12]
----------------
efriedma-quic wrote:


Spent a little time looking at why the default code is so horrible; the primary 
issue is actually the way the legalizer (GenWidenVectorStores) is trying to 
lower the operation into an i16 store followed by an i8 store. It ends up 
generating a bitcast from v4i8 to v2i16, and the default handling for that is 
completely terrible (it doesn't know how to use a shuffle, so it goes through a 
stack temporary).

Maybe worth looking into improving the bitcast situation if you're going to 
continue looking at very narrow vector types.

https://github.com/llvm/llvm-project/pull/78637
_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[libcxxabi] [clang] [openmp] [libc] [clang-tools-extra] [mlir] [flang] [compiler-rt] [libcxx] [llvm] [AArch64] Combine store (trunc X to <3 x i8>) to sequence of ST1.b. (PR #78637)

Reply via email to