On Fri, May 28, 2010 at 9:08 PM, Vladimir N. Makarov <vmaka...@redhat.com> wrote: > On 05/28/2010 12:38 PM, H.J. Lu wrote: >> >> Hi, >> >> I want to generate vzeroupper when I know upper 128bits aren't used. I >> can't find >> a way to mark an pattern which zeros upper 128bits. So I added >> >> ;; Clear the upper 128bits of AVX registers, equivalent to a NOP. >> ;; This should be used only when the upper 128bits are unused. >> (define_insn "avx_vzeroupper_nop" >> [(unspec_volatile [(const_int 0)] UNSPECV_VZEROUPPER)] >> "TARGET_AVX" >> "vzeroupper" >> [(set_attr "type" "sse") >> (set_attr "modrm" "0") >> (set_attr "memory" "none") >> (set_attr "prefix" "vex") >> (set_attr "mode" "OI")]) >> >> For this simple code, >> >> --- >> typedef float __m256 __attribute__ ((__vector_size__ (32), >> __may_alias__)); >> >> extern __m256 x, z; >> extern void bar2 (void); >> >> int >> foo (__m256 y) >> { >> bar2 (); >> z = y; >> return 0; >> } >> --- >> >> before IRA, >> >> (insn 2 4 3 2 x.i:9 (set (reg/v:V8SF 59 [ y ]) >> (reg:V8SF 21 xmm0 [ y ])) 1036 {*avx_movv8sf_internal} >> (expr_list:REG_DEAD (reg:V8SF 21 xmm0 [ y ]) >> (nil))) >> >> (note 3 2 6 2 NOTE_INSN_FUNCTION_BEG) >> >> (insn 6 3 7 2 x.i:10 (unspec_volatile [ >> (const_int 0 [0]) >> ] 17) 1960 {avx_vzeroupper_nop} (nil)) >> >> (call_insn 7 6 8 2 x.i:10 (call (mem:QI (symbol_ref:DI ("bar2") [flags >> 0x41]<function_decl 0x7ffa930ecd00 bar2>) [0 S1 A8]) >> (const_int 0 [0])) 599 {*call_0} (nil) >> (nil)) >> >> >> after IRA, >> >> (insn 6 3 20 2 x.i:10 (unspec_volatile [ >> (const_int 0 [0]) >> ] 17) 1960 {avx_vzeroupper_nop} (nil)) >> >> (insn 20 6 7 2 x.i:10 (set (mem/c:V8SF (reg/f:DI 7 sp) [3 S32 A256]) >> (reg:V8SF 21 xmm0)) 1036 {*avx_movv8sf_internal} (nil)) >> >> (call_insn 7 20 21 2 x.i:10 (call (mem:QI (symbol_ref:DI ("bar2") >> [flags 0x41]<function_decl 0x7ffa930ecd00 bar2>) [0 S1 A8]) >> (const_int 0 [0])) 599 {*call_0} (nil) >> (nil)) >> >> Since vzeroupper will change xmm0/ymm0, the value saved on stack is wrong. >> Is that a way to tell IRA not to move an instruction? >> >> > > I think IRA itself is not responsible for this. This is probably reload > more accurately caller-saves.c. The result of reload is in ira dump so it > only looks that it is IRA. > > Insn 20 probably is generated by caller-saves.c. I can not find special > treatment of unspec_volatile as changing all registers in caller-saves.c. > So I think here is the problem. But this is only my speculations, some > investigation should be done to be sure (e.g. where is insn 20 generated). > >
XMM0 is caller-saved. insn 20 came from insn 2 which saves XMM0 onto stack. IRA/reload wants to do it right before call at -O2. I opened a bug: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44323 -- H.J.