On 05/28/2010 12:38 PM, H.J. Lu wrote:
Hi,

I want to generate vzeroupper when I know upper 128bits aren't used. I
can't find
a way to mark an pattern which zeros upper 128bits. So I added

;; Clear the upper 128bits of AVX registers, equivalent to a NOP.
;; This should be used only when the upper 128bits are unused.
(define_insn "avx_vzeroupper_nop"
   [(unspec_volatile [(const_int 0)] UNSPECV_VZEROUPPER)]
   "TARGET_AVX"
   "vzeroupper"
   [(set_attr "type" "sse")
    (set_attr "modrm" "0")
    (set_attr "memory" "none")
    (set_attr "prefix" "vex")
    (set_attr "mode" "OI")])

For this simple code,

---
typedef float __m256 __attribute__ ((__vector_size__ (32),
          __may_alias__));

extern __m256 x, z;
extern void bar2 (void);

int
foo (__m256 y)
{
   bar2 ();
   z = y;
   return 0;
}
---

before IRA,

(insn 2 4 3 2 x.i:9 (set (reg/v:V8SF 59 [ y ])
         (reg:V8SF 21 xmm0 [ y ])) 1036 {*avx_movv8sf_internal}
(expr_list:REG_DEAD (reg:V8SF 21 xmm0 [ y ])
         (nil)))

(note 3 2 6 2 NOTE_INSN_FUNCTION_BEG)

(insn 6 3 7 2 x.i:10 (unspec_volatile [
             (const_int 0 [0])
         ] 17) 1960 {avx_vzeroupper_nop} (nil))

(call_insn 7 6 8 2 x.i:10 (call (mem:QI (symbol_ref:DI ("bar2") [flags
0x41]<function_decl 0x7ffa930ecd00 bar2>) [0 S1 A8])
         (const_int 0 [0])) 599 {*call_0} (nil)
     (nil))


after IRA,

(insn 6 3 20 2 x.i:10 (unspec_volatile [
             (const_int 0 [0])
         ] 17) 1960 {avx_vzeroupper_nop} (nil))

(insn 20 6 7 2 x.i:10 (set (mem/c:V8SF (reg/f:DI 7 sp) [3 S32 A256])
         (reg:V8SF 21 xmm0)) 1036 {*avx_movv8sf_internal} (nil))

(call_insn 7 20 21 2 x.i:10 (call (mem:QI (symbol_ref:DI ("bar2")
[flags 0x41]<function_decl 0x7ffa930ecd00 bar2>) [0 S1 A8])
         (const_int 0 [0])) 599 {*call_0} (nil)
     (nil))

Since vzeroupper will change xmm0/ymm0, the value saved on stack is wrong.
Is that a way to tell IRA not to move an instruction?

I think IRA itself is not responsible for this. This is probably reload more accurately caller-saves.c. The result of reload is in ira dump so it only looks that it is IRA.

Insn 20 probably is generated by caller-saves.c. I can not find special treatment of unspec_volatile as changing all registers in caller-saves.c. So I think here is the problem. But this is only my speculations, some investigation should be done to be sure (e.g. where is insn 20 generated).

Reply via email to