Re: [Qemu-devel] [PATCH v1 07/14] fpu: introduce hostfloat

Alex Bennée Tue, 27 Mar 2018 04:50:25 -0700

Emilio G. Cota <c...@braap.org> writes:

> The appended paves the way for leveraging the host FPU for a subset
> of guest FP operations. For most guest workloads (e.g. FP flags
> aren't ever cleared, inexact occurs often and rounding is set to the
> default [to nearest]) this will yield sizable performance speedups.
>
> The approach followed here avoids checking the FP exception flags register.
> See the comment at the top of hostfloat.c for details.
>
> This assumes that QEMU is running on an IEEE754-compliant FPU and
> that the rounding is set to the default (to nearest). The
> implementation-dependent specifics of the FPU should not matter; things
> like tininess detection and snan representation are still dealt with in
> soft-fp. However, this approach will break on most hosts if we compile
> QEMU with flags such as -ffast-math. We control the flags so this should
> be easy to enforce though.


The thing I would avoid is generating is any x87 instructions as we can
get weird effects if the compiler ever decides to stash a signalling NaN
in an x87 register.

Anyway perhaps -fno-fast-math should be explicit when building fpu/* code?

>
> The licensing in softfloat.h is complicated at best, so to keep things
> simple I'm adding this as a separate, GPL'ed file.

I don't think we need to worry about this. It's fine to add GPL only
stuff to softfloat.c and since the re-factoring (or before really) we
"own" this code and are unlikely to upstream anything.

My preference would be to include this all in softfloat.c unless there
is a very good reason not to.

>
> This patch just adds some boilerplate code; subsequent patches add
> operations, one per commit to ease bisection.
>
> Signed-off-by: Emilio G. Cota <c...@braap.org>
> ---
>  Makefile.target           |  2 +-
>  include/fpu/hostfloat.h   | 14 +++++++
>  include/fpu/softfloat.h   |  1 +
>  fpu/hostfloat.c           | 96 
> +++++++++++++++++++++++++++++++++++++++++++++++
>  target/m68k/Makefile.objs |  2 +-
>  tests/fp-test/Makefile    |  2 +-
>  6 files changed, 114 insertions(+), 3 deletions(-)
>  create mode 100644 include/fpu/hostfloat.h
>  create mode 100644 fpu/hostfloat.c
>
> diff --git a/Makefile.target b/Makefile.target
> index 6549481..efcdfb9 100644
> --- a/Makefile.target
> +++ b/Makefile.target
> @@ -97,7 +97,7 @@ obj-$(CONFIG_TCG) += tcg/tcg.o tcg/tcg-op.o 
> tcg/tcg-op-vec.o tcg/tcg-op-gvec.o
>  obj-$(CONFIG_TCG) += tcg/tcg-common.o tcg/optimize.o
>  obj-$(CONFIG_TCG_INTERPRETER) += tcg/tci.o
>  obj-$(CONFIG_TCG_INTERPRETER) += disas/tci.o
> -obj-y += fpu/softfloat.o
> +obj-y += fpu/softfloat.o fpu/hostfloat.o
>  obj-y += target/$(TARGET_BASE_ARCH)/
>  obj-y += disas.o
>  obj-$(call notempty,$(TARGET_XML_FILES)) += gdbstub-xml.o
> diff --git a/include/fpu/hostfloat.h b/include/fpu/hostfloat.h
> new file mode 100644
> index 0000000..b01291b
> --- /dev/null
> +++ b/include/fpu/hostfloat.h
> @@ -0,0 +1,14 @@
> +/*
> + * Copyright (C) 2018, Emilio G. Cota <c...@braap.org>
> + *
> + * License: GNU GPL, version 2 or later.
> + *   See the COPYING file in the top-level directory.
> + */
> +#ifndef HOSTFLOAT_H
> +#define HOSTFLOAT_H
> +
> +#ifndef SOFTFLOAT_H
> +#error fpu/hostfloat.h must only be included from softfloat.h
> +#endif
> +
> +#endif /* HOSTFLOAT_H */
> diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h
> index 8fb44a8..8963b68 100644
> --- a/include/fpu/softfloat.h
> +++ b/include/fpu/softfloat.h
> @@ -95,6 +95,7 @@ enum {
>  };
>
>  #include "fpu/softfloat-types.h"
> +#include "fpu/hostfloat.h"
>
>  static inline void set_float_detect_tininess(int val, float_status *status)
>  {
> diff --git a/fpu/hostfloat.c b/fpu/hostfloat.c
> new file mode 100644
> index 0000000..cab0341
> --- /dev/null
> +++ b/fpu/hostfloat.c
> @@ -0,0 +1,96 @@
> +/*
> + * hostfloat.c - FP primitives that use the host's FPU whenever possible.
> + *
> + * Copyright (C) 2018, Emilio G. Cota <c...@braap.org>
> + *
> + * License: GNU GPL, version 2 or later.
> + *   See the COPYING file in the top-level directory.
> + *
> + * Fast emulation of guest FP instructions is challenging for two reasons.
> + * First, FP instruction semantics are similar but not identical, 
> particularly
> + * when handling NaNs. Second, emulating at reasonable speed the guest FP
> + * exception flags is not trivial: reading the host's flags register with a
> + * feclearexcept & fetestexcept pair is slow [slightly slower than soft-fp],
> + * and trapping on every FP exception is not fast nor pleasant to work with.
> + *
> + * This module leverages the host FPU for a subset of the operations. To
> + * do this it follows the main idea presented in this paper:
> + *
> + * Guo, Yu-Chuan, et al. "Translating the ARM Neon and VFP instructions in a
> + * binary translator." Software: Practice and Experience 46.12 
> (2016):1591-1615.
> + *
> + * The idea is thus to leverage the host FPU to (1) compute FP operations
> + * and (2) identify whether FP exceptions occurred while avoiding
> + * expensive exception flag register accesses.
> + *
> + * An important optimization shown in the paper is that given that exception
> + * flags are rarely cleared by the guest, we can avoid recomputing some 
> flags.
> + * This is particularly useful for the inexact flag, which is very frequently
> + * raised in floating-point workloads.
> + *
> + * We optimize the code further by deferring to soft-fp whenever FP
> + * exception detection might get hairy. Fortunately this is not common.
> + */
> +#include <math.h>
> +
> +#include "qemu/osdep.h"
> +#include "fpu/softfloat.h"
> +
> +#define GEN_TYPE_CONV(name, to_t, from_t)       \
> +    static inline to_t name(from_t a)           \
> +    {                                           \
> +        to_t r = *(to_t *)&a;                   \
> +        return r;                               \
> +    }
> +
> +GEN_TYPE_CONV(float32_to_float, float, float32)
> +GEN_TYPE_CONV(float64_to_double, double, float64)
> +GEN_TYPE_CONV(float_to_float32, float32, float)
> +GEN_TYPE_CONV(double_to_float64, float64, double)
> +#undef GEN_TYPE_CONV
> +
> +#define GEN_INPUT_FLUSH(soft_t)                                         \
> +    static inline __attribute__((always_inline)) void                   \
> +    soft_t ## _input_flush__nocheck(soft_t *a, float_status *s)         \
> +    {                                                                   \
> +        if (unlikely(soft_t ## _is_denormal(*a))) {                     \
> +            *a = soft_t ## _set_sign(soft_t ## _zero,                   \
> +                                     soft_t ## _is_neg(*a));            \
> +            s->float_exception_flags |= float_flag_input_denormal;      \
> +        }                                                               \
> +    }                                                                   \
> +                                                                        \
> +    static inline __attribute__((always_inline)) void                   \
> +    soft_t ## _input_flush1(soft_t *a, float_status *s)                 \
> +    {                                                                   \
> +        if (likely(!s->flush_inputs_to_zero)) {                         \
> +            return;                                                     \
> +        }                                                               \
> +        soft_t ## _input_flush__nocheck(a, s);                          \
> +    }                                                                   \
> +                                                                        \
> +    static inline __attribute__((always_inline)) void                   \
> +    soft_t ## _input_flush2(soft_t *a, soft_t *b, float_status *s)      \
> +    {                                                                   \
> +        if (likely(!s->flush_inputs_to_zero)) {                         \
> +            return;                                                     \
> +        }                                                               \
> +        soft_t ## _input_flush__nocheck(a, s);                          \
> +        soft_t ## _input_flush__nocheck(b, s);                          \
> +    }                                                                   \
> +                                                                        \
> +    static inline __attribute__((always_inline)) void                   \
> +    soft_t ## _input_flush3(soft_t *a, soft_t *b, soft_t *c,            \
> +                            float_status *s)                            \
> +    {                                                                   \
> +        if (likely(!s->flush_inputs_to_zero)) {                         \
> +            return;                                                     \
> +        }                                                               \
> +        soft_t ## _input_flush__nocheck(a, s);                          \
> +        soft_t ## _input_flush__nocheck(b, s);                          \
> +        soft_t ## _input_flush__nocheck(c, s);                          \
> +    }
> +
> +GEN_INPUT_FLUSH(float32)
> +GEN_INPUT_FLUSH(float64)

Having spent time getting rid of a bunch of macro expansions I'm wary of
adding more in. However for these I guess it's kind of marginal.

> +#undef GEN_INPUT_FLUSH
> diff --git a/target/m68k/Makefile.objs b/target/m68k/Makefile.objs
> index ac61948..2868b11 100644
> --- a/target/m68k/Makefile.objs
> +++ b/target/m68k/Makefile.objs
> @@ -1,5 +1,5 @@
>  obj-y += m68k-semi.o
>  obj-y += translate.o op_helper.o helper.o cpu.o
> -obj-y += fpu_helper.o softfloat.o
> +obj-y += fpu_helper.o softfloat.o hostfloat.o
>  obj-y += gdbstub.o
>  obj-$(CONFIG_SOFTMMU) += monitor.o
> diff --git a/tests/fp-test/Makefile b/tests/fp-test/Makefile
> index 703434f..187cfcc 100644
> --- a/tests/fp-test/Makefile
> +++ b/tests/fp-test/Makefile
> @@ -28,7 +28,7 @@ ibm:
>  $(WHITELIST_FILES):
>       wget -nv -O $@ http://www.cs.columbia.edu/~cota/qemu/fpbench-$@
>
> -fp-test$(EXESUF): fp-test.o softfloat.o
> +fp-test$(EXESUF): fp-test.o softfloat.o hostfloat.o
>
>  clean:
>       rm -f *.o *.d $(OBJS)


--
Alex Bennée

Re: [Qemu-devel] [PATCH v1 07/14] fpu: introduce hostfloat

Reply via email to