Hi, Richard. On 07/23/2021 01:44 PM, Richard Henderson wrote: > On 7/20/21 11:53 PM, Song Gao wrote: >> +uint64_t helper_fp_sqrt_d(CPULoongArchState *env, uint64_t fp) >> +{ >> + fp = float64_sqrt(fp, &env->active_fpu.fp_status); >> + update_fcsr0(env, GETPC()); >> + return fp; >> +} >> + >> +uint32_t helper_fp_sqrt_s(CPULoongArchState *env, uint32_t fp) >> +{ >> + fp = float32_sqrt(fp, &env->active_fpu.fp_status); >> + update_fcsr0(env, GETPC()); >> + return fp; >> +} > > I believe you will find it easier to take and return uint64_t, even for > 32-bit operations. The manual says that the high bits may contain any value, > so in my opinion you should not work hard to preserve the high bits, as you > currently do with > >> + gen_load_fpr32(fp0, a->fj); >> + gen_load_fpr32(fp1, a->fk); >> + gen_helper_fp_add_s(fp0, cpu_env, fp0, fp1); >> + gen_store_fpr32(fp0, a->fd); > > I think this should be as simple as > > gen_helper_fp_add_s(cpu_fpu[a->fd], cpu_env, > cpu_fpu[a->fj], cpu_fpu[a->fk]); > > I also think that loongarch should learn from risc-v and change the > architecture to "nan-box" single-precision results -- fill the high 32-bits > with 1s. This is an SNaN representation for double-precision and will > immediately fail when incorrectly using a single-precision value as a > double-precision input. > > Thankfully the current architecture is backward compatible with nan-boxing. >
by this method, the trans_fadd_s is static bool trans_fadd_s(DisasContext *ctx, arg_fadd_s * a) { TCGv_i64 fp0, fp1; fp0 = tcg_temp_new_i64(); fp1 = tcg_temp_new_i64(); check_fpu_enabled(ctx); gen_load_fpr64(fp0, a->fj); gen_load_fpr64(fp1, a->fk); gen_helper_fp_add_s(fp0, cpu_env, fp0, fp1); gen_check_nanbox_s(fp0, fp0); /* from riscv */ gen_store_fpr64(fp0, a->fd); tcg_temp_free_i64(fp0); tcg_temp_free_i64(fp1); return true; } uint64_t helper_fp_add_s(CPULoongArchState *env, uint64_t fp, uint64_t fp1) { uint32_t fp2; fp2 = float32_add((uint32_t)fp, (uint32_t)fp1, &env->active_fpu.fp_status); update_fcsr0(env, GETPC()); return (uint64_t)fp2; } is this right? >> +/* Floating point arithmetic operation instruction translation */ >> +static bool trans_fadd_s(DisasContext *ctx, arg_fadd_s * a) >> +{ >> + TCGv_i32 fp0, fp1; >> + >> + fp0 = tcg_temp_new_i32(); >> + fp1 = tcg_temp_new_i32(); >> + >> + check_fpu_enabled(ctx); >> + gen_load_fpr32(fp0, a->fj); >> + gen_load_fpr32(fp1, a->fk); >> + gen_helper_fp_add_s(fp0, cpu_env, fp0, fp1); >> + gen_store_fpr32(fp0, a->fd); >> + >> + tcg_temp_free_i32(fp0); >> + tcg_temp_free_i32(fp1); >> + >> + return true; >> +} > > Again, you should use some helper functions to reduce the repetition. > OK>> +static bool trans_fmadd_d(DisasContext *ctx, arg_fmadd_d *a) >> +{ >> + TCGv_i64 fp0, fp1, fp2, fp3; >> + >> + fp0 = tcg_temp_new_i64(); >> + fp1 = tcg_temp_new_i64(); >> + fp2 = tcg_temp_new_i64(); >> + fp3 = tcg_temp_new_i64(); >> + >> + check_fpu_enabled(ctx); >> + gen_load_fpr64(fp0, a->fj); >> + gen_load_fpr64(fp1, a->fk); >> + gen_load_fpr64(fp2, a->fa); >> + check_fpu_enabled(ctx); > > Repeating check_fpu_enabled. > OK. >> + gen_helper_fp_madd_d(fp3, cpu_env, fp0, fp1, fp2); >> + gen_store_fpr64(fp3, a->fd); > > I think you might as well pass in the float_muladd_* constant to a single > helper rather than having 4 different helpers. > OK > > r~ Again. thank you kindly help. Thanks Song Gao.