https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63534
--- Comment #10 from Stupachenko Evgeny <evstupac at gmail dot com> --- (In reply to Jakub Jelinek from comment #8) > For -pg, at least for 32-bit -fpic, one way to handle this would be > for !targetm.profile_before_prologue () && crtl->profile in ix86_init_pic_reg > instead of emitting set_got into the pic_offset_table_rtx emit set_got into > %ebx > hard reg and then copy %ebx to the pic_offset_table_rtx (to strongly hint RA > that it better should allocate the pic register at the start of the function > to %ebx). And then, when emitting prologue, see if the function doesn't > start > with set_got insn (after optional notes) loading into %ebx, and if it does, > move the set_got insn right before the NOTE_INSN_PROLOGUE_END (on which > final.c > emits the _mcount call). That way, there will be just a single set_got, not > two. If you don't find it for some reason (e.g. function that doesn't use > PIC register otherwise, or something unexpected happened), make sure you > treat %ebx > as clobbered in the prologue and emit the set_got into %ebx directly right > before NOTE_INSN_PROLOGUE_END. For -m64 -fpic -mcmodel=large -pg this will > be harder, as init_pic_reg emits multiple instructions. Sounds reasonable. I also don't like 2 set_got one-by-one. However, we should ask Vladimir on how we can force RA allocate pseudo GOT on %ebx. I expect there should be an easier way to do this. And we should refer %ebx for pseudo GOT register in all 32bit cases. Right now we can "emit second set_got" and file a bug on potential performance improvement in RA. I've measured spec2000 o2 -fporfile-generate execution on train data on Corei7. Even with additional set_got there is: CINT +0,2 CFP +1,4 compared to a compiler before "enabling ebx".