On 12/17/18 4:23 AM, Mark Cave-Ayland wrote: > NOTE: there are a lot of instructions that cannot (yet) be optimised to use > TCG vector > operations, however it struck me that there may be some potential for > converting > saturating add/sub and cmp instructions if there were a mechanism to return a > set of > flags indicating the result of the saturation/comparison.
There are also a lot of instructions that can be converted, but aren't: * vspltis[bhw] can use tcg_gen_gvec_dup{8,16,32}i. * vsplt{b,h,w} can use tcg_gen_gvec_dup_mem. Note that you'll need something like vec_reg_offset from target/arm/translate-a64.h to compute the offset of the specific byte/word/long from which we are to splat. * vmr should be handled by having tcg_gen_gvec_or notice aofs == bofs. For ARM, we do special case this during translation. But since tcg/tcg-op.c does these things for tcg_gen_or_i64, we should probably handle the same set of transformations. * vnot would need to be handled by actually adding a tcg_gen_gvec_nor and then also noticing aofs == bofs. For saturation, I think the easiest thing to do is represent SAT as a ppc_avr_t. We notice saturation by also computing normal arithmetic and comparing to see if they differ. E.g. tcg_gen_gvec_add(vece, offsetof_avr_tmp, offsetof(ra), offsetof(rb), 16, 16); tcg_gen_gvec_ssadd(vece, offsetof(rt), offsetof(ra), offsetof(rb), 16, 16); tcg_gen_gvec_cmp(TCG_COND_NE, vece, offsetof_avr_tmp, offsetof_avr_tmp, offsetof(rt), 16, 16); tcg_gen_gvec_or(vece, offsetof_avr_sat, offsetof_avr_sat, offsetof_avr_tmp, 16, 16); You only need to convert the ppc_avr_t to a single bit when reading VSCR. For comparisons... that's tricky. I wonder if there's anything better than tcg_gen_gvec_cmp(TCG_COND_FOO, vece, offsetof(rt), offsetof(ra), offsetof(rb), 16, 16); if (rc) { TCGv_i64 hi, lo, t, f; tcg_gen_ld_i64(hi, cpu_env, offsetof(rt)); tcg_gen_ld_i64(lo, cpu_env, offsetof(rt) + 8); tcg_gen_and_i64(t, hi, lo); tcg_gen_or_i64(f, hi, lo); tcg_gen_setcondi_i64(TCG_COND_EQ, t, t, -1); tcg_gen_setcondi_i64(TCG_COND_EQ, f, f, 0); // truncate to i32, shift, or, and set to cr6. } r~