================ @@ -281,23 +279,19 @@ entry: define void @store_trunc_add_from_64bits(ptr %src, ptr %dst) { ; CHECK-LABEL: store_trunc_add_from_64bits: ; CHECK: ; %bb.0: ; %entry -; CHECK-NEXT: sub sp, sp, #16 -; CHECK-NEXT: .cfi_def_cfa_offset 16 ; CHECK-NEXT: ldr s0, [x0] ; CHECK-NEXT: add x9, x0, #4 ; CHECK-NEXT: Lloh0: ; CHECK-NEXT: adrp x8, lCPI7_0@PAGE ; CHECK-NEXT: Lloh1: ; CHECK-NEXT: ldr d1, [x8, lCPI7_0@PAGEOFF] +; CHECK-NEXT: add x8, x1, #1 ; CHECK-NEXT: ld1.h { v0 }[2], [x9] +; CHECK-NEXT: add x9, x1, #2 ; CHECK-NEXT: add.4h v0, v0, v1 -; CHECK-NEXT: xtn.8b v1, v0 -; CHECK-NEXT: umov.h w8, v0[2] -; CHECK-NEXT: str s1, [sp, #12] -; CHECK-NEXT: ldrh w9, [sp, #12] ---------------- efriedma-quic wrote:
Spent a little time looking at why the default code is so horrible; the primary issue is actually the way the legalizer (GenWidenVectorStores) is trying to lower the operation into an i16 store followed by an i8 store. It ends up generating a bitcast from v4i8 to v2i16, and the default handling for that is completely terrible (it doesn't know how to use a shuffle, so it goes through a stack temporary). Maybe worth looking into improving the bitcast situation if you're going to continue looking at very narrow vector types. https://github.com/llvm/llvm-project/pull/78637 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits