On 01/25/2018 09:03 AM, Peter Maydell wrote: > On 17 January 2018 at 16:14, Richard Henderson > <richard.hender...@linaro.org> wrote: >> Signed-off-by: Richard Henderson <richard.hender...@linaro.org> >> --- >> target/arm/translate-a64.c | 386 >> ++++++++++++++++++++++++++++++++++++++------- >> 1 file changed, 329 insertions(+), 57 deletions(-) >> >> diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c >> index 2495414603..1b5005637d 100644 >> --- a/target/arm/translate-a64.c >> +++ b/target/arm/translate-a64.c >> @@ -6489,17 +6489,6 @@ static void handle_shri_with_rndacc(TCGv_i64 tcg_res, >> TCGv_i64 tcg_src, >> } >> } >> >> -/* Common SHL/SLI - Shift left with an optional insert */ >> -static void handle_shli_with_ins(TCGv_i64 tcg_res, TCGv_i64 tcg_src, >> - bool insert, int shift) >> -{ >> - if (insert) { /* SLI */ >> - tcg_gen_deposit_i64(tcg_res, tcg_res, tcg_src, shift, 64 - shift); >> - } else { /* SHL */ >> - tcg_gen_shli_i64(tcg_res, tcg_src, shift); >> - } >> -} >> - >> /* SRI: shift right with insert */ >> static void handle_shri_with_ins(TCGv_i64 tcg_res, TCGv_i64 tcg_src, >> int size, int shift) >> @@ -6603,7 +6592,11 @@ static void handle_scalar_simd_shli(DisasContext *s, >> bool insert, >> tcg_rn = read_fp_dreg(s, rn); >> tcg_rd = insert ? read_fp_dreg(s, rd) : tcg_temp_new_i64(); >> >> - handle_shli_with_ins(tcg_rd, tcg_rn, insert, shift); >> + if (insert) { >> + tcg_gen_deposit_i64(tcg_rd, tcg_rd, tcg_rn, shift, 64 - shift); >> + } else { >> + tcg_gen_shli_i64(tcg_rd, tcg_rn, shift); >> + } > > It looks like you're folding handle_shli_with_ins() into its > now only callsite, but handle_shri_with_ins() has been left as > its own function?
I didn't notice that. I'll have a look. >> +static void gen_shr8_ins_i64(TCGv_i64 d, TCGv_i64 a, int64_t shift) >> +{ >> + uint64_t mask = (0xff >> shift) * (-1ull / 0xff); >> + TCGv_i64 t = tcg_temp_new_i64(); >> + >> + tcg_gen_shri_i64(t, a, shift); >> + tcg_gen_andi_i64(t, t, mask); >> + tcg_gen_andi_i64(d, d, ~mask); >> + tcg_gen_or_i64(d, d, t); >> + tcg_temp_free_i64(t); > > The previous code was able to work with just shifts and deposits -- > why do we need to open-code this kind of mask-and-or now? Is this > because we now operate an i64 at a time when we used to operate > on smaller quantities at once? Yes, exactly. It's now 4 total operations instead of 16. I should also tidy this to use a new dup_const function that's been introduced since I first wrote this code... r~