After some more digging and adjusting I found additional cases that are 
optimizing out registers
thus I decided to continue this thread to  keep discussion compact.

With some changes simplified implementation of my expansion is as follows:
tmp_op0 = gen_reg_rtx (mode);
emit_move_insn (tmp_op0, op0);
tmp_op1 = gen_reg_rtx (mode);
emit_move_insn (tmp_op1, op1);

// This is important part
reg = gen_rtx_REG(wide_mode, XMM2_REG);
emit_insn (gen_rtx_SET (reg, tmp_op1));

emit_insn (gen_myinsn(op2, reg));

emit_insn (gen_rtx_SET (tmp_op0, reg));
////

And my md is as follows:
(define_insn "myinsn"
  [(unspec [(match_operand:SI 0 "register_operand" "r")
            (match_operand:V4SI 1 "vector_operand")]
            UNSPEC_MYINSN)
   (clobber (reg:V4SI XMM2_REG))]
  "TARGET_MYTARGET"
  "instr\t%0"
  [(set_attr "type" "other")])

This is working like a charm when built with any optimization level producing 
something like this:

movdqu  %eax, %xmm2
instr      %edx
movups  %xmm2, %eax

Unfortunately, when I build it with additional -mavx2 or -mavx512f first move 
(from reg to xmm2) is
optimized out. I'm using those extra flags because I also want to use YMM2 and 
ZMM2 in my instruction.

Does anyone have idea why might such thing happen? And how this can be overcome?

Thanks,
Sebastian


> -----Original Message-----
> Subject: Re: Question regarding preventing optimizing out of register in
> expansion
> 
> On 06/21/2018 05:20 AM, Peryt, Sebastian wrote:
> > Hi,
> >
> > I'd appreciate if someone could advise me in builtin expansion I'm currently
> writing.
> >
> > High level description for what I want to do:
> >
> > I have 2 operands in my builtin.
> 
> IIUC you're defining an UNSPEC.
> 
> > First I set register (reg1) with value from operand1 (op1); Second I
> > call my instruction (reg1 is called implicitly and updated);
> 
> Here is your error -- NEVER have implicit register settings.  The data flow
> analysers need accurate information.
> 
> 
> > Simplified implementation in i386.c I have:
> >
> > reg1 = gen_reg_rtx (mode);
> > emit_insn (gen_rtx_SET (reg1, op1);
> > emit_clobber (reg1);
> 
> At this point reg1 is dead.  That means the previous set of reg1 from
> op1 is unneeded and can be deleted.
> 
> > emit_insn (gen_myinstruction ());
> 
> This instruction has no inputs or outputs, and is not marked volatile(?)
> so can be deleted.
> 
> > emit_insn (gen_rtx_SET (op2,reg1));
> 
> And this is storing a value from a dead register.
> 
> You need something like:
>    rtx reg1 = force_reg (op1);
>    rtx reg2 = gen_reg_rtx (mode);
>    emit_insn (gen_my_insn (reg2, reg1));
>    emit insn (gen_rtx_SET (op2, reg2));
> 
> your instruction should be an UNSPEC showing what the inputs and outputs
> are.  That tells the optimizers what depends on what, but the compiler
> has no clue about what the transform is.
> 
> nathan
> --
> Nathan Sidwell

Reply via email to