This is really a good question! Consider the requirement of this optimization.
1. There should be at least 2 methods to load a global variable's address from GOT. Usually it means using different relocation types. 2. By default all global variables access use the same one method. 3. In some cases (less than X global variables access) method A is better, in other cases method B is better. With these constraints a simplify_GOT optimization pass is applicable. But these constraints are too weak. The new optimization pass nearly can do nothing except a call to target specific hook. I suspect such a pass is acceptable. We can also add more constraints: 4. If we can restrict method A as following: first load the base address of GOT into a register pic_reg, then the real global variable's address is loaded as load offset_reg, the offset from GOT base to the GOT entry load address, [pic_reg + offset_reg] With this constraint the new pass knows there is a special register pic_reg, it can look for and count all usage of pic_reg. If all usages are method A and the count is more than the target specific threshold, then the usages can be rewritten as method B. The method detection and rewritten should be target specific. I don't know how other targets handle global address access with -fpic. And how many targets satisfy these 4 constraints. thanks Guozhi On Fri, Apr 2, 2010 at 4:31 AM, Steven Bosscher <stevenb....@gmail.com> wrote: > On Thu, Apr 1, 2010 at 8:10 PM, Andrew Haley <a...@redhat.com> wrote: >> On 28/03/10 15:45, Carrot Wei wrote: >>> Hi >>> >>> The detailed description of the optimization is at >>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43129. This is an ARM >>> specific optimization. >>> >>> This optimization uses one less register (the register hold the GOT >>> base), to get this beneficial the ideal place for it should be before >>> register allocation. >>> >>> Usually expand pass generates instructions to load global variable's >>> address from GOT entry for each access of the global variable. Later >>> cse/gcse passes can remove many of them. In order to precisely model >>> the cost, this optimization should be put after some cse/gcse passes. >>> >>> So what is the best place for this optimization? Is there any existed >>> pass can be enhanced with this optimization? Or should I add a new >>> pass? >> >> The obvious place is machine-dependent reorg, which is a very late pass. > > Yes, and after register allocation, i.e. too late for Guozhi. > > Basically there is no place right now to stuff a pass like that. > Question is: Is this optimization really, reallyreallyreally so target > specific that a target-independent pass is not the better option? > > Ciao! > Steven >