This volatile_spec tells the compiler it does not touch any of the
registers so ira and reload can insert its instructions in either
place. Lying to reload is bad news.
Sent from my iPhone
On May 29, 2010, at 8:26 AM, "H.J. Lu" <hjl.to...@gmail.com> wrote:
On Fri, May 28, 2010 at 9:08 PM, Vladimir N. Makarov
<vmaka...@redhat.com> wrote:
On 05/28/2010 12:38 PM, H.J. Lu wrote:
Hi,
I want to generate vzeroupper when I know upper 128bits aren't
used. I
can't find
a way to mark an pattern which zeros upper 128bits. So I added
;; Clear the upper 128bits of AVX registers, equivalent to a NOP.
;; This should be used only when the upper 128bits are unused.
(define_insn "avx_vzeroupper_nop"
[(unspec_volatile [(const_int 0)] UNSPECV_VZEROUPPER)]
"TARGET_AVX"
"vzeroupper"
[(set_attr "type" "sse")
(set_attr "modrm" "0")
(set_attr "memory" "none")
(set_attr "prefix" "vex")
(set_attr "mode" "OI")])
For this simple code,
---
typedef float __m256 __attribute__ ((__vector_size__ (32),
__may_alias__));
extern __m256 x, z;
extern void bar2 (void);
int
foo (__m256 y)
{
bar2 ();
z = y;
return 0;
}
---
before IRA,
(insn 2 4 3 2 x.i:9 (set (reg/v:V8SF 59 [ y ])
(reg:V8SF 21 xmm0 [ y ])) 1036 {*avx_movv8sf_internal}
(expr_list:REG_DEAD (reg:V8SF 21 xmm0 [ y ])
(nil)))
(note 3 2 6 2 NOTE_INSN_FUNCTION_BEG)
(insn 6 3 7 2 x.i:10 (unspec_volatile [
(const_int 0 [0])
] 17) 1960 {avx_vzeroupper_nop} (nil))
(call_insn 7 6 8 2 x.i:10 (call (mem:QI (symbol_ref:DI ("bar2")
[flags
0x41]<function_decl 0x7ffa930ecd00 bar2>) [0 S1 A8])
(const_int 0 [0])) 599 {*call_0} (nil)
(nil))
after IRA,
(insn 6 3 20 2 x.i:10 (unspec_volatile [
(const_int 0 [0])
] 17) 1960 {avx_vzeroupper_nop} (nil))
(insn 20 6 7 2 x.i:10 (set (mem/c:V8SF (reg/f:DI 7 sp) [3 S32 A256])
(reg:V8SF 21 xmm0)) 1036 {*avx_movv8sf_internal} (nil))
(call_insn 7 20 21 2 x.i:10 (call (mem:QI (symbol_ref:DI ("bar2")
[flags 0x41]<function_decl 0x7ffa930ecd00 bar2>) [0 S1 A8])
(const_int 0 [0])) 599 {*call_0} (nil)
(nil))
Since vzeroupper will change xmm0/ymm0, the value saved on stack
is wrong.
Is that a way to tell IRA not to move an instruction?
I think IRA itself is not responsible for this. This is probably
reload
more accurately caller-saves.c. The result of reload is in ira
dump so it
only looks that it is IRA.
Insn 20 probably is generated by caller-saves.c. I can not find
special
treatment of unspec_volatile as changing all registers in caller-
saves.c.
So I think here is the problem. But this is only my speculations,
some
investigation should be done to be sure (e.g. where is insn 20
generated).
XMM0 is caller-saved. insn 20 came from insn 2 which saves XMM0
onto stack. IRA/reload wants to do it right before call at -O2. I
opened a
bug:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44323
--
H.J.