2014-08-25 19:08 GMT+04:00 Vladimir Makarov <vmaka...@redhat.com>: > On 2014-08-22 8:21 AM, Ilya Enkovich wrote: >> >> Hi, >> >> On Cauldron 2014 we had a couple of talks about relaxation of ebx usage in >> 32bit PIC mode. It was decided that the best approach would be to not fix >> ebx register, use speudo register for GOT base address and let allocator do >> the rest. This should be similar to how clang and icc work with GOT base >> address. I've been working for some time on such patch and now want to >> share my results. >> >> The idea of the patch was very simple and included few things; >> 1. Set PIC_OFFSET_TABLE_REGNUM to INVALID_REGNUM to specify that we do >> not have any hard reg fixed for PIC. >> 2. Initialize pic_offset_table_rtx with a new pseudo register in the >> begining of a function expand. >> 3. Change ABI so that there is a possible implicit PIC argument for >> calls; pic_offset_table_rtx is used as an arg value if such implicit arg >> exist. >> >> Such approach worked well on small tests but trying to run some benchmarks >> we faced a problem with reload of address constants. The problem is that >> when we try to rematerialize address constant or some constant memory >> reference, we have to use pic_offset_table_rtx. It means we insert new >> usages of a speudo register and alocator cannot handle it correctly. Same >> problem also applies for float and vector constants. >> >> Rematerialization is not the only case causing new pic_offset_table_rtx >> usage. Another case is a split of some instructions using constant but not >> having proper constraints. E.g. pushtf pattern allows push of constant but >> it has to be replaced with push of memory in reload pass causing additional >> usage of pic_offset_table_rtx. >> >> There are two ways to fix it. The first one is to support modifications >> of pseudo register live range during reload and correctly allocate hard regs >> for its new usages (currently we have some hard reg allocated for new usage >> of pseudo reg but it may contain value of some other pseudo reg; thus we >> reveal the problem at runtime only). >> > > I believe there is already code to deal with this situation. It is code for > risky transformations (please check flag lra_risky_transformation_p). If > this flag is set, next lra assign subpass is running and checking > correctness of assignments (e.g. checking situation when two different > pseudos have intersected live ranges and the same assigned hard reg. If > such dangerous situation is found, it is fixed).
I tried to remove my restrictions from setup_reg_equiv and initialize lra_risky_transformation_p with 'true' in lra_constraints instead. I got only 50% pass rate for SPEC2000 on Ofast with LTO. Will search for fail reason. Ilya > > >> The second way is to avoid all cases when new usages of >> pic_offset_table_rtx appear in reload. That is a way I chose because it >> appeared simplier to me and would allow me to get some performance data >> faster. Also having rematerialization of address anf float constants in PIC >> mode would mean we have higher register pressure, thus having them on stack >> should be even more efficient. To achieve it I had to cut off reg equivs to >> all exprs using symbol references and all constants living in the memory. I >> also had to avoid instructions requiring split in reload causing load of >> constant from memory (*push[txd]f). >> >> Resulting compiler successfully passes make check, compiles EEMBC and >> SPEC2000 benchmarks. There is no confidence I covered all cases and there >> still may be some templates causing split in reload with new >> pic_offset_table_rtx usages. I think support of reload with pseudo PIC >> would be better and more general solution. But I don't know how difficult >> is to implement it though. Any ideas on resolving this reload issue? >> > > Please see what I mentioned above. May be it can fix the degradation. > Rematerialization is important for performance and switching it of > completely is not wise. > > > >> I collected some performance numbers for EEMBC and SPEC2000 benchmarks. >> Here are patch results for -Ofast optlevel with LTO collectd on Avoton >> server: >> AUTOmark +1,9% >> TELECOMmark +4,0% >> DENmark +10,0% >> SPEC2000 -0,5% >> >> There are few degradations on EEMBC benchmarks but on SPEC2000 situation >> is different and we see more performance losses. Some of them are caused by >> disabled rematerialization of address constants. In some cases relaxed ebx >> causes more spills/fills in plaecs where GOT is frequently used. There are >> also some minor fixes required in the patch to allow more efficient function >> prolog (avoid unnecessary GOT register initialization and allow its >> initialization without ebx usage). Suppose some performance problems may be >> resolved but a good fix for reload should go first. >> >> > > Ilya, the optimization you are trying to implement is important in many > cases and should be in some way included in gcc. If the degradations can be > solved in a way i mentioned above we could introduce a machine-dependent > flag. >