2014-08-25 19:08 GMT+04:00 Vladimir Makarov <vmaka...@redhat.com>:
> On 2014-08-22 8:21 AM, Ilya Enkovich wrote:
>>
>> Hi,
>>
>> On Cauldron 2014 we had a couple of talks about relaxation of ebx usage in
>> 32bit PIC mode.  It was decided that the best approach would be to not fix
>> ebx register, use speudo register for GOT base address and let allocator do
>> the rest.  This should be similar to how clang and icc work with GOT base
>> address.  I've been working for some time on such patch and now want to
>> share my results.
>>
>> The idea of the patch was very simple and included few things;
>>   1.  Set PIC_OFFSET_TABLE_REGNUM to INVALID_REGNUM to specify that we do
>> not have any hard reg fixed for PIC.
>>   2.  Initialize pic_offset_table_rtx with a new pseudo register in the
>> begining of a function expand.
>>   3.  Change ABI so that there is a possible implicit PIC argument for
>> calls; pic_offset_table_rtx is used as an arg value if such implicit arg
>> exist.
>>
>> Such approach worked well on small tests but trying to run some benchmarks
>> we faced a problem with reload of address constants.  The problem is that
>> when we try to rematerialize address constant or some constant memory
>> reference, we have to use pic_offset_table_rtx.  It means we insert new
>> usages of a speudo register and alocator cannot handle it correctly.  Same
>> problem also applies for float and vector constants.
>>
>> Rematerialization is not the only case causing new pic_offset_table_rtx
>> usage.  Another case is a split of some instructions using constant but not
>> having proper constraints.  E.g. pushtf pattern allows push of constant but
>> it has to be replaced with push of memory in reload pass causing additional
>> usage of pic_offset_table_rtx.
>>
>> There are two ways to fix it.  The first one is to support modifications
>> of pseudo register live range during reload and correctly allocate hard regs
>> for its new usages (currently we have some hard reg allocated for new usage
>> of pseudo reg but it may contain value of some other pseudo reg; thus we
>> reveal the problem at runtime only).
>>
>
> I believe there is already code to deal with this situation.  It is code for
> risky transformations (please check flag lra_risky_transformation_p).  If
> this flag is set, next lra assign subpass is running and checking
> correctness of assignments (e.g. checking situation when two different
> pseudos have intersected live ranges and the same assigned hard reg.  If
> such dangerous situation is found, it is fixed).

I tried to remove my restrictions from setup_reg_equiv and initialize
lra_risky_transformation_p with 'true' in lra_constraints instead.  I
got only 50% pass rate for SPEC2000 on Ofast with LTO.  Will search
for fail reason.

Ilya

>
>
>> The second way is to avoid all cases when new usages of
>> pic_offset_table_rtx appear in reload.  That is a way I chose because it
>> appeared simplier to me and would allow me to get some performance data
>> faster.  Also having rematerialization of address anf float constants in PIC
>> mode would mean we have higher register pressure, thus having them on stack
>> should be even more efficient.  To achieve it I had to cut off reg equivs to
>> all exprs using symbol references and all constants living in the memory.  I
>> also had to avoid instructions requiring split in reload causing load of
>> constant from memory (*push[txd]f).
>>
>> Resulting compiler successfully passes make check, compiles EEMBC and
>> SPEC2000 benchmarks.  There is no confidence I covered all cases and there
>> still may be some templates causing split in reload with new
>> pic_offset_table_rtx usages.  I think support of reload with pseudo PIC
>> would be better and more general solution.  But I don't know how difficult
>> is to implement it though.  Any ideas on resolving this reload issue?
>>
>
> Please see what I mentioned above.  May be it can fix the degradation.
> Rematerialization is important for performance and switching it of
> completely is not wise.
>
>
>
>> I collected some performance numbers for EEMBC and SPEC2000 benchmarks.
>> Here are patch results for -Ofast optlevel with LTO collectd on Avoton
>> server:
>> AUTOmark +1,9%
>> TELECOMmark +4,0%
>> DENmark +10,0%
>> SPEC2000 -0,5%
>>
>> There are few degradations on EEMBC benchmarks but on SPEC2000 situation
>> is different and we see more performance losses.  Some of them are caused by
>> disabled rematerialization of address constants.  In some cases relaxed ebx
>> causes more spills/fills in plaecs where GOT is frequently used.  There are
>> also some minor fixes required in the patch to allow more efficient function
>> prolog (avoid unnecessary GOT register initialization and allow its
>> initialization without ebx usage).  Suppose some performance problems may be
>> resolved but a good fix for reload should go first.
>>
>>
>
> Ilya, the optimization you are trying to implement is important in many
> cases and should be in some way included in gcc.  If the degradations can be
> solved in a way i mentioned above we could introduce a machine-dependent
> flag.
>

Reply via email to