On 09/28/2016 10:11 AM, Nikunj A Dadhania wrote: > Richard Henderson <r...@twiddle.net> writes: > >> On 09/27/2016 10:31 PM, Nikunj A Dadhania wrote: >>> +DEF_HELPER_1(bswap16x4, i64, i64) >> >> DEF_HELPER_FLAGS_1(bswap16x4, TCG_CALL_NO_RWG_SE, i64, i64) >> >>> + uint64_t m = 0x00ff00ff00ff00ffull; >>> + return ((x & m) << 8) | ((x >> 8) & m); >> >> ... although I suppose this is only 5 instructions, and could reasonably be >> done inline too. Especially if you shared the one 64-bit constant across the >> two bswaps. > > Something like this: > > static void gen_bswap16x4(TCGv_i64 val) > { > TCGv_i64 mask = tcg_const_i64(0x00FF00FF00FF00FF); > TCGv_i64 t0 = tcg_temp_new_i64(); > TCGv_i64 t1 = tcg_temp_new_i64(); > > /* val = ((val & mask) << 8) | ((val >> 8) & mask) */ > tcg_gen_and_i64(t0, val, mask); > tcg_gen_shri_i64(t0, t0, 8); > tcg_gen_shli_i64(t1, val, 8); > tcg_gen_and_i64(t1, t1, mask); > tcg_gen_or_i64(val, t0, t1); > > tcg_temp_free_i64(t0); > tcg_temp_free_i64(t1); > tcg_temp_free_i64(mask); > }
Like that, except that since you always perform this twice, you should share the expensive constant load. Recall also that you need temporaries for the store, so static void gen_bswap16x8(TCGv_i64 outh, TCGv_i64 outl, TCGv_i64 inh, TCGv_i64 inl) r~