https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63534

--- Comment #10 from Stupachenko Evgeny <evstupac at gmail dot com> ---
(In reply to Jakub Jelinek from comment #8)
> For -pg, at least for 32-bit -fpic, one way to handle this would be
> for !targetm.profile_before_prologue () && crtl->profile in ix86_init_pic_reg
> instead of emitting set_got into the pic_offset_table_rtx emit set_got into
> %ebx
> hard reg and then copy %ebx to the pic_offset_table_rtx (to strongly hint RA
> that it better should allocate the pic register at the start of the function
> to %ebx).  And then, when emitting prologue, see if the function doesn't
> start
> with set_got insn (after optional notes) loading into %ebx, and if it does,
> move the set_got insn right before the NOTE_INSN_PROLOGUE_END (on which
> final.c
> emits the _mcount call).  That way, there will be just a single set_got, not
> two.  If you don't find it for some reason (e.g. function that doesn't use
> PIC register otherwise, or something unexpected happened), make sure you
> treat %ebx
> as clobbered in the prologue and emit the set_got into %ebx directly right
> before NOTE_INSN_PROLOGUE_END.  For -m64 -fpic -mcmodel=large -pg this will
> be harder, as init_pic_reg emits multiple instructions.

Sounds reasonable. I also don't like 2 set_got one-by-one. However, we should
ask Vladimir on how we can force RA allocate pseudo GOT on %ebx. I expect there
should be an easier way to do this. And we should refer %ebx for pseudo GOT
register in all 32bit cases.
Right now we can "emit second set_got" and file a bug on potential performance
improvement in RA.

I've measured spec2000 o2 -fporfile-generate execution on train data on Corei7.
Even with additional set_got there is:
CINT +0,2
CFP  +1,4
compared to a compiler before "enabling ebx".

Reply via email to