On Fri, Oct 24, 2014 at 07:19:53PM +0400, Evgeny Stupachenko wrote:
> >What is wrong in emitting the set_got right before the PROLOGUE_END
> >note and that way sharing a single load from both?
> Can you please explain the idea? Now set_got emitted right after
> PROLOGUE_END, what is the advantage in emitting it right before?
> Which load is going to be shared?

I thought I've already explained.
In ix86_init_pic_reg 32-bit part, if crtl->profile, instead of
      rtx insn = emit_insn (gen_set_got (pic_offset_table_rtx));
      RTX_FRAME_RELATED_P (insn) = 1;
do:
      rtx reg = crtl->profile
                ? gen_rtx_REG (Pmode, REAL_PIC_OFFSET_TABLE_REGNUM)
                : pic_offset_table_rtx;
      rtx insn = emit_insn (gen_set_got (reg));
      RTX_FRAME_RELATED_P (insn) = 1;
      if (crtl->profile)
        emit_move_insn (pic_offset_table_rtx, reg);
or so.  That will ensure the RA will most likely allocate the pic pseudo
to %ebx at the start of the function, and even if it doesn't, it will still
be loaded into that reg and only then moved to some other reg.

Then, supposedly you need to tweak the condition in ix86_save_reg:
  if (pic_offset_table_rtx
      && !ix86_use_pseudo_pic_reg ()
      && regno == REAL_PIC_OFFSET_TABLE_REGNUM
      && (df_regs_ever_live_p (REAL_PIC_OFFSET_TABLE_REGNUM)
          || crtl->profile
          || crtl->calls_eh_return
          || crtl->uses_const_pool
          || cfun->has_nonlocal_label))
    return ix86_select_alt_pic_regnum () == INVALID_REGNUM;
to something like:
  if (regno == REAL_PIC_OFFSET_TABLE_REGNUM
      && pic_offset_table_rtx)
    {
      if (ix86_use_pseudo_pic_reg ())
        {
          /* %ebx needed by call to _mcount after the prologue.  */
          if (!TARGET_64BIT && flag_pic && crtl->profile)
            return true;
        }
      else if (df_regs_ever_live_p (REAL_PIC_OFFSET_TABLE_REGNUM)
               || crtl->profile
               || crtl->calls_eh_return
               || crtl->uses_const_pool
               || cfun->has_nonlocal_label))
        return ix86_select_alt_pic_regnum () == INVALID_REGNUM;
    }
which will make sure the prologue/epilogue saves/restores %ebx properly.

And, finally, for the !TARGET_64BIT && flag_pic && crtl->profile case
e.g. at the end of ix86_expand_prologue, check if the prologue is followed
by a series of notes (one of which is the PROLOGUE_END note), but no real
insns, and followed by set_got pattern (perhaps check that it recogs to
CODE_FOR_set_got) that loads into %ebx.  If it does, then fine, and just
move that insn from where it is emitted to before those notes.
If you don't find it there, emit set_got insn to %ebx yourself at the end
of the prologue.  Then no need to change the _mcount call in any way.
The profiler code is emitted on the PROLOGUE_END note, so if you managed
to move the set_got across the PROLOGUE_END note, or if you added an extra
one (e.g. for the case when no set_got was really needed in the rest of the
function), at that point the pic register will be allocated in %ebx.

> >> --- a/gcc/config/i386/i386.c
> >> +++ b/gcc/config/i386/i386.c
> >> @@ -39124,13 +39124,22 @@ x86_function_profiler (FILE *file, int
> >> labelno ATTRIBUTE_UNUSED)
> >>        else
> >>         x86_print_call_or_nop (file, mcount_name);
> >>      }
> >> +  /* At this stage we can't detrmine where GOT register is, as RA can 
> >> allocate
> >> +     it to any hard register.  Therefore we need to set it once again.  */
> >>    else if (flag_pic)
> >>      {
> >> +      pic_labels_used |= 1 << BX_REG;
> >> +      fprintf (file,"\tsub\t$16, %%esp\n");
> >> +      fprintf (file,"\tmovl\t%%ebx, (%%esp)\n");
> >> +      fprintf (file,"\tcall\t__x86.get_pc_thunk.bx\n");
> >> +      fprintf (file,"\taddl\t$_GLOBAL_OFFSET_TABLE_, %%ebx\n");
> >>  #ifndef NO_PROFILE_COUNTERS
> >>        fprintf (file, "\tleal\t%sP%d@GOTOFF(%%ebx),%%"
> >> PROFILE_COUNT_REGISTER "\n",
> >>                LPREFIX, labelno);
> >>  #endif
> >>        fprintf (file, "1:\tcall\t*%s@GOT(%%ebx)\n", mcount_name);
> >> +      fprintf (file,"\tmovl\t(%%esp), %%ebx\n");
> >> +      fprintf (file,"\tadd\t$16, %%esp\n");

Note, the unwind info is wrong even in this case.  Whenever you are in
between that call\t__x86.get_pc_thunk.bx and movl\t(%%esp), %%ebx,
there is no unwind info telling the debug info consumers that %ebx has been
saved to the stack and where, so any time the debugger or anything else
looks up at outer frames e.g. from _mcount, the %ebx will contain bogus
value in the function that calls the function with _mcount call.

        Jakub

Reply via email to