On 2/22/22 04:36, matheus.fe...@eldorado.org.br wrote:
From: "Lucas Mateus Castro (alqotel)" <lucas.cas...@eldorado.org.br>
Changed vmulhuw, vmulhud, vmulhsw, vmulhsd to not
use helpers.
Signed-off-by: Lucas Mateus Castro (alqotel) <lucas.ara...@eldorado.org.br>
Signed-off-by: Matheus Ferst <matheus.fe...@eldorado.org.br>
---
Changes in v4:
Changed from gvec to i64, this resulted in a better performance on
a Power host for all 4 instructions and a better performance for
vmulhsw and vmulhuw in x86, but a worse performance for vmulhsd and
vmulhud in a x86 host.
Unsurprising.
+static void do_vx_vmulhd_i64(TCGv_i64 t, TCGv_i64 a, TCGv_i64 b, bool sign)
+{
+ TCGv_i64 a1, b1, mask, w, k;
+ void (*tcg_gen_shift_imm)(TCGv_i64, TCGv_i64, int64_t);
+
+ a1 = tcg_temp_new_i64();
+ b1 = tcg_temp_new_i64();
+ w = tcg_temp_new_i64();
+ k = tcg_temp_new_i64();
+ mask = tcg_temp_new_i64();
+ if (sign) {
+ tcg_gen_shift_imm = tcg_gen_sari_i64;
+ } else {
+ tcg_gen_shift_imm = tcg_gen_shri_i64;
+ }
+
+ tcg_gen_movi_i64(mask, 0xFFFFFFFF);
+ tcg_gen_and_i64(a1, a, mask);
+ tcg_gen_and_i64(b1, b, mask);
+ tcg_gen_mul_i64(t, a1, b1);
+ tcg_gen_shri_i64(k, t, 32);
+
+ tcg_gen_shift_imm(a1, a, 32);
+ tcg_gen_mul_i64(t, a1, b1);
+ tcg_gen_add_i64(t, t, k);
+ tcg_gen_and_i64(k, t, mask);
+ tcg_gen_shift_imm(w, t, 32);
+
+ tcg_gen_and_i64(a1, a, mask);
+ tcg_gen_shift_imm(b1, b, 32);
+ tcg_gen_mul_i64(t, a1, b1);
+ tcg_gen_add_i64(t, t, k);
+ tcg_gen_shift_imm(k, t, 32);
+
+ tcg_gen_shift_imm(a1, a, 32);
+ tcg_gen_mul_i64(t, a1, b1);
+ tcg_gen_add_i64(t, t, w);
+ tcg_gen_add_i64(t, t, k);
You should be using tcg_gen_mul{s,u}2_i64 instead of open-coding the high-part
multiplication.
r~