On 09/28/2016 10:11 AM, Nikunj A Dadhania wrote:
> Richard Henderson <r...@twiddle.net> writes:
> 
>> On 09/27/2016 10:31 PM, Nikunj A Dadhania wrote:
>>> +DEF_HELPER_1(bswap16x4, i64, i64)
>>
>> DEF_HELPER_FLAGS_1(bswap16x4, TCG_CALL_NO_RWG_SE, i64, i64)
>>
>>> +    uint64_t m = 0x00ff00ff00ff00ffull;
>>> +    return ((x & m) << 8) | ((x >> 8) & m);
>>
>> ... although I suppose this is only 5 instructions, and could reasonably be
>> done inline too.  Especially if you shared the one 64-bit constant across the
>> two bswaps.
> 
> Something like this:
> 
> static void gen_bswap16x4(TCGv_i64 val)
> {
>     TCGv_i64 mask = tcg_const_i64(0x00FF00FF00FF00FF);
>     TCGv_i64 t0 = tcg_temp_new_i64();
>     TCGv_i64 t1 = tcg_temp_new_i64();
> 
>     /* val = ((val & mask) << 8) | ((val >> 8) & mask) */
>     tcg_gen_and_i64(t0, val, mask); 
>     tcg_gen_shri_i64(t0, t0, 8);
>     tcg_gen_shli_i64(t1, val, 8);
>     tcg_gen_and_i64(t1, t1, mask);
>     tcg_gen_or_i64(val, t0, t1);
> 
>     tcg_temp_free_i64(t0);
>     tcg_temp_free_i64(t1);
>     tcg_temp_free_i64(mask);
> }

Like that, except that since you always perform this twice, you should share
the expensive constant load.  Recall also that you need temporaries for the
store, so

static void gen_bswap16x8(TCGv_i64 outh, TCGv_i64 outl,
                          TCGv_i64 inh, TCGv_i64 inl)


r~

Reply via email to