Hi,
I am going to benchmark the following hunk separately tonight. It is
independent change.

Rth, Vladimir: there are obviously several options how to make GCC use SSE for
64bit loads/stores in 32bit codegen (and 128bit loads/stores in 128bit
codegen). What do you think is best variant here?

(an alternative would be to make move patterns to preffer SSE variant in this
case or change RA order to iterate through SSE first, but at least with pre-IRA
this used to lead to bad decisions making RA to place value in SSE despite the
fact it is used in arithmetic that can't be done with SSE).

Honza

@@ -15266,6 +15363,38 @@ ix86_expand_move (enum machine_mode mode, rtx 
operands[])
     }
   else
     {
+      if (mode == DImode
+         && !TARGET_64BIT
+         && TARGET_SSE2
+         && MEM_P (op0)
+         && MEM_P (op1)
+         && !push_operand (op0, mode)
+         && can_create_pseudo_p ())
+       {
+         rtx temp = gen_reg_rtx (V2DImode);
+         emit_insn (gen_sse2_loadq (temp, op1));
+         emit_insn (gen_sse_storeq (op0, temp));
+         return;
+       }
+      if (mode == DImode
+         && !TARGET_64BIT
+         && TARGET_SSE
+         && !MEM_P (op1)
+         && GET_MODE (op1) == V2DImode)
+       {
+         emit_insn (gen_sse_storeq (op0, op1));
+         return;
+       }
+      if (mode == TImode
+         && TARGET_AVX2
+         && MEM_P (op0)
+         && !MEM_P (op1)
+         && GET_MODE (op1) == V4DImode)
+       {
+         op0 = convert_to_mode (V2DImode, op0, 1);
+         emit_insn (gen_vec_extract_lo_v4di (op0, op1));
+         return;
+       }
       if (MEM_P (op0)
          && (PUSH_ROUNDING (GET_MODE_SIZE (mode)) != GET_MODE_SIZE (mode)
              || !push_operand (op0, mode))

Reply via email to