On 03/22/2018 01:57 PM, Emilio G. Cota wrote: >> Is there any especially good reason you want to not put this code into the >> normal softfloat function? Does it really many any measurable difference at >> all to force this code to be inlined into a helper? > > You mean to do this? (... or see below) > > --- a/fpu/hostfloat.c > +++ b/fpu/hostfloat.c > @@ -97,7 +97,7 @@ GEN_INPUT_FLUSH(float64) > > #define GEN_FPU_ADDSUB(add_name, sub_name, soft_t, host_t, \ > host_abs_func, min_normal) \ > - static inline __attribute__((always_inline)) soft_t \ > + static soft_t \ > fpu_ ## soft_t ## _addsub(soft_t a, soft_t b, bool subtract, \ > float_status *s) \ > { \ > > That slows add/sub dramatically, because addsub is not inlined into > float32_add and float32_sub (that's an extra function call plus an > extra branch per emulated op). > > For x86_64-linux-user/qemu-x86_64 tests/fp-bench -o add, the above gives > - before: 188.06 MFlops > - after: 117.56 MFlops
Well, not quite. I meant putting all of the new code into softfloat.c, and not attempting to inline any of it in target/cpu/foo_helper.c. The best written target helpers currently simply tail-call to softfloat.c. There are others that do so after minor argument adjustment. For these, I have had in my plans to rearrange things such that e.g. float32_add is called from TCG directly, with no other function calls at all. For targets that cannot do that, I simply cannot bring myself to care about the final percentage points enough to want to introduce extra macros. Another thought re all of the soft_is_normal || soft_is_zero checks that you're performing. I think it would be nice if we could work with float*_unpack_canonical so that we don't have to duplicate work. E.g. /* Return true for float_class_normal && float_class_zero. */ static inline bool is_finite(FloatClass c) { return c <= float_class_zero; } float32 float32_add(float32 a, float32 b, float_status *s) { FloatClass a_cls = float32_classify(a); FloatClass b_cls = float32_classify(b); if (is_finite(a_cls) && is_finite(b_cls) && ...) { /* do hardfp thing */ } pa = float32_unpack(a, ca, s); pb = float32_unpack(b, cb, s); pr = addsub_floats(pa, pb, s, false); return float32_round_pack(pr, s); } Where float32_classify produces Normal/Zero/Inf/NaN and might avoid duplicate work within float32_unpack. r~