On Mon, Mar 9, 2020 at 10:26 AM Wilco Dijkstra wrote:
>
> Hi Christophe,
>
> > I noticed a regression introduced by Delia's patch "aarch64: ACLE
> > intrinsics for BFCVTN, BFCVTN2 and BFCVT":
> > (on aarch64-linux-gnu)
> > FAIL: g++.dg/cpp0x/variadic-sizeof4.C -std=c++14 (internal compiler error)
Hi Christophe,
> I noticed a regression introduced by Delia's patch "aarch64: ACLE
> intrinsics for BFCVTN, BFCVTN2 and BFCVT":
> (on aarch64-linux-gnu)
> FAIL: g++.dg/cpp0x/variadic-sizeof4.C -std=c++14 (internal compiler error)
>
> I couldn't reproduce it with current ToT, until I realized that
On Fri, 6 Mar 2020 at 16:03, Wilco Dijkstra wrote:
>
> Inline assembler instructions don't have latency info and the scheduler does
> not attempt to schedule them at all - it does not even honor latencies of
> asm source operands. As a result, SIMD intrinsics which are implemented using
> inline a
> +;; vmlal_lane_s16 intrinsics
> +(define_insn "aarch64_vec_mlal_lane"
> + [(set (match_operand: 0 "register_operand" "=w")
> + (plus: (match_operand: 1 "register_operand" "0")
> + (mult:
> + (ANY_EXTEND:
> + (match_operand: 2 "register_operand" "w"))
> + (ANY_
Inline assembler instructions don't have latency info and the scheduler does
not attempt to schedule them at all - it does not even honor latencies of
asm source operands. As a result, SIMD intrinsics which are implemented using
inline assembler perform very poorly, particularly on in-order cores.