On 5/5/23 14:27, Peter Maydell wrote:
On Wed, 3 May 2023 at 08:18, Richard Henderson
<richard.hender...@linaro.org> wrote:

Use the fpu to perform 64-bit loads and stores.

Signed-off-by: Richard Henderson <richard.hender...@linaro.org>


@@ -2091,7 +2095,20 @@ static void tcg_out_qemu_ld_direct(TCGContext *s, TCGReg 
datalo, TCGReg datahi,
              datalo = datahi;
              datahi = t;
          }
-        if (h.base == datalo || h.index == datalo) {
+        if (h.atom == MO_64) {
+            /*
+             * Atomicity requires that we use use a single 8-byte load.
+             * For simplicity and code size, always use the FPU for this.
+             * Similar insns using SSE/AVX are merely larger.

I'm surprised there's no performance penalty for throwing old-school
FPU insns into what is presumably otherwise code that's only
using modern SSE.

I have no idea about performance.  We don't require SSE for TCG at the moment.

I assume the caller has arranged that the top of the stack
is trashable at this point?

The entire fpu stack is call-clobbered.


r~


Reply via email to