Hello,
I definitely don't expect the attached patch to be accepted, but I would
like some advice on the direction to go, and a patch that passes the
testsuite and does the optimization I want on a couple testcases seems
like it may help start the conversation. This is the first time I even
look at .md files...
The goal is to optimize: v8sf x; v4sf y=*(v4sf*)&x; so the compiler
doesn't copy x to memory (yes, I know there is an intrinsic to do that).
If I understood Richard Guenther's comment in the PR, it can be optimized
in the back-end. The only way I found to place this kind of transformation
is with define_peephole2. And I couldn't figure out how to test if 2
memory operands correspond to the same address, with different types (so
match_dup is unhappy), and for some reason the XEXP(*,0) comparison said
yes on my test and no when using an unrelated piece of memory, but it
looks like a nonsense test that is just lucky on a couple trivial
examples.
Any help?
2012-05-02 Marc Glisse <[email protected]>
PR target/53101
gcc/
* config/i386/sse.md: New peephole2 for subvectors.
gcc/testsuite/
* gcc.target/i386/pr53101.c: New test.
--
Marc GlisseIndex: gcc/testsuite/gcc.target/i386/pr53101.c
===================================================================
--- gcc/testsuite/gcc.target/i386/pr53101.c (revision 0)
+++ gcc/testsuite/gcc.target/i386/pr53101.c (revision 0)
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx" } */
+
+typedef double v2df __attribute__ ((vector_size (16)));
+typedef double v4df __attribute__ ((vector_size (32)));
+typedef double v4si __attribute__ ((vector_size (16)));
+typedef double v8si __attribute__ ((vector_size (32)));
+
+v4si
+avx_extract_v4si (v8si x)
+{
+ return *(v4si*)&x;
+}
+
+v2df
+avx_extract_v2df (v4df x __attribute((unused)), v4df y)
+{
+ return *(v2df*)&y;
+}
+
+/* { dg-final { scan-assembler-not "movdq" } } */
+/* { dg-final { scan-assembler-times "movapd" 1 } } */
Property changes on: gcc/testsuite/gcc.target/i386/pr53101.c
___________________________________________________________________
Added: svn:keywords
+ Author Date Id Revision URL
Added: svn:eol-style
+ native
Index: gcc/config/i386/sse.md
===================================================================
--- gcc/config/i386/sse.md (revision 187012)
+++ gcc/config/i386/sse.md (working copy)
@@ -4104,10 +4104,34 @@
emit_move_insn (operands[0], adjust_address (operands[1], SFmode, i*4));
DONE;
})
+;; This is how we receive accesses to the first half of a vector.
+(define_peephole2
+ [(set (match_operand:VI8F_256 3 "memory_operand")
+ (match_operand:VI8F_256 1 "register_operand"))
+ (set (match_operand:<ssehalfvecmode> 0 "register_operand")
+ (match_operand:<ssehalfvecmode> 2 "memory_operand"))]
+ "TARGET_AVX && rtx_equal_p (XEXP (operands[2], 0), XEXP (operands[3], 0))"
+ [(set (match_dup 0)
+ (vec_select:<ssehalfvecmode> (match_dup 1)
+ (parallel [(const_int 0) (const_int
1)])))]
+)
+
+(define_peephole2
+ [(set (match_operand:VI4F_256 3 "memory_operand")
+ (match_operand:VI4F_256 1 "register_operand"))
+ (set (match_operand:<ssehalfvecmode> 0 "register_operand")
+ (match_operand:<ssehalfvecmode> 2 "memory_operand"))]
+ "TARGET_AVX && rtx_equal_p (XEXP (operands[2], 0), XEXP (operands[3], 0))"
+ [(set (match_dup 0)
+ (vec_select:<ssehalfvecmode> (match_dup 1)
+ (parallel [(const_int 0) (const_int 1)
+ (const_int 2) (const_int 3)])))]
+)
+
(define_expand "avx_vextractf128<mode>"
[(match_operand:<ssehalfvecmode> 0 "nonimmediate_operand")
(match_operand:V_256 1 "register_operand")
(match_operand:SI 2 "const_0_to_1_operand")]
"TARGET_AVX"