Hi, Richard.

On 07/23/2021 01:44 PM, Richard Henderson wrote:
> On 7/20/21 11:53 PM, Song Gao wrote:
>> +uint64_t helper_fp_sqrt_d(CPULoongArchState *env, uint64_t fp)
>> +{
>> +    fp = float64_sqrt(fp, &env->active_fpu.fp_status);
>> +    update_fcsr0(env, GETPC());
>> +    return fp;
>> +}
>> +
>> +uint32_t helper_fp_sqrt_s(CPULoongArchState *env, uint32_t fp)
>> +{
>> +    fp = float32_sqrt(fp, &env->active_fpu.fp_status);
>> +    update_fcsr0(env, GETPC());
>> +    return fp;
>> +}
> 
> I believe you will find it easier to take and return uint64_t, even for 
> 32-bit operations.  The manual says that the high bits may contain any value, 
> so in my opinion you should not work hard to preserve the high bits, as you 
> currently do with
> 
>> +    gen_load_fpr32(fp0, a->fj);
>> +    gen_load_fpr32(fp1, a->fk);
>> +    gen_helper_fp_add_s(fp0, cpu_env, fp0, fp1);
>> +    gen_store_fpr32(fp0, a->fd);
> 
> I think this should be as simple as
> 
>   gen_helper_fp_add_s(cpu_fpu[a->fd], cpu_env,
>                       cpu_fpu[a->fj], cpu_fpu[a->fk]);
>
> I also think that loongarch should learn from risc-v and change the 
> architecture to "nan-box" single-precision results -- fill the high 32-bits 
> with 1s.  This is an SNaN representation for double-precision and will 
> immediately fail when incorrectly using a single-precision value as a 
> double-precision input.
> 
> Thankfully the current architecture is backward compatible with nan-boxing.
>

by this method,  the trans_fadd_s is   

static bool trans_fadd_s(DisasContext *ctx, arg_fadd_s * a)
{
    TCGv_i64 fp0, fp1;

    fp0 = tcg_temp_new_i64();
    fp1 = tcg_temp_new_i64();

    check_fpu_enabled(ctx);
    gen_load_fpr64(fp0, a->fj);
    gen_load_fpr64(fp1, a->fk);
    gen_helper_fp_add_s(fp0, cpu_env, fp0, fp1);

    gen_check_nanbox_s(fp0, fp0); /* from riscv */

    gen_store_fpr64(fp0, a->fd);

    tcg_temp_free_i64(fp0);
    tcg_temp_free_i64(fp1);

    return true;
}

uint64_t helper_fp_add_s(CPULoongArchState *env, uint64_t fp, uint64_t fp1)
{
    uint32_t fp2;

    fp2 = float32_add((uint32_t)fp, (uint32_t)fp1, &env->active_fpu.fp_status);
    update_fcsr0(env, GETPC());
    return (uint64_t)fp2;
}

is this right?

 
>> +/* Floating point arithmetic operation instruction translation */
>> +static bool trans_fadd_s(DisasContext *ctx, arg_fadd_s * a)
>> +{
>> +    TCGv_i32 fp0, fp1;
>> +
>> +    fp0 = tcg_temp_new_i32();
>> +    fp1 = tcg_temp_new_i32();
>> +
>> +    check_fpu_enabled(ctx);
>> +    gen_load_fpr32(fp0, a->fj);
>> +    gen_load_fpr32(fp1, a->fk);
>> +    gen_helper_fp_add_s(fp0, cpu_env, fp0, fp1);
>> +    gen_store_fpr32(fp0, a->fd);
>> +
>> +    tcg_temp_free_i32(fp0);
>> +    tcg_temp_free_i32(fp1);
>> +
>> +    return true;
>> +}
> 
> Again, you should use some helper functions to reduce the repetition.
>
OK>> +static bool trans_fmadd_d(DisasContext *ctx, arg_fmadd_d *a)
>> +{
>> +    TCGv_i64 fp0, fp1, fp2, fp3;
>> +
>> +    fp0 = tcg_temp_new_i64();
>> +    fp1 = tcg_temp_new_i64();
>> +    fp2 = tcg_temp_new_i64();
>> +    fp3 = tcg_temp_new_i64();
>> +
>> +    check_fpu_enabled(ctx);
>> +    gen_load_fpr64(fp0, a->fj);
>> +    gen_load_fpr64(fp1, a->fk);
>> +    gen_load_fpr64(fp2, a->fa);
>> +    check_fpu_enabled(ctx);
> 
> Repeating check_fpu_enabled.
> 
OK.
>> +    gen_helper_fp_madd_d(fp3, cpu_env, fp0, fp1, fp2);
>> +    gen_store_fpr64(fp3, a->fd);
> 
> I think you might as well pass in the float_muladd_* constant to a single 
> helper rather than having 4 different helpers.
> 
OK
> 
> r~

Again. thank you kindly help.

Thanks
Song Gao.


Reply via email to