On Sat, Aug 6, 2011 at 5:00 PM, Paolo Bonzini <bonz...@gnu.org> wrote: > On 08/04/2011 01:10 PM, Andrew Haley wrote: >>>> >>>> >> It's the sort of thing that gets done in threaded interpreters, >>>> >> where you really need to keep a few pointers in registers and >>>> >> the interpreter itself is a very long function. gcc has always >>>> >> done a dreadful job of register allocation in such cases. >>> >>> > >>> > Sure, but what I have seen people use global register variables >>> > for this (which means they get taken away from the register >>> > allocator). >> >> Not always though, and the x86 has so few registers that using a >> global register variable is very problematic. I suppose you could >> compile the threaded interpreter in a file of its own, but I'm not >> sure that has quite the same semantics as local register variables. > > Indeed, local register variables give almost the same benefit as globals > with half the burden. The idea is that you don't care about the exact > register that holds the contents but, by specifying a callee-save register, > GCC will use those instead of memory across calls. This reduces _a lot_ the > number of spills. > >> The problem is that people who care about this stuff very much don't >> always read...@gcc.gnu.org so won't be heard. But in their own world >> (LISP, Forth) nice features like register variables and labels as >> values have led to gcc being the preferred compiler for this kind of >> work. > > /me raises hands. > > For GNU Smalltalk, using > > #if defined(__i386__) > # define __DECL_REG1 __asm("%esi") > # define __DECL_REG2 __asm("%edi") > # define __DECL_REG3 /* no more caller-save regs if PIC is in use! */ > #endif > > #if defined(__x86_64__) > # define __DECL_REG1 __asm("%r12") > # define __DECL_REG2 __asm("%r13") > # define __DECL_REG3 __asm("%rbx") > #endif > > ... > > register unsigned char *ip __DECL_REG1; > register OOP * sp __DECL_REG2; > register intptr_t arg __DECL_REG3; > > improves performance by up to 20% if I remember correctly. I can benchmark > it if desired. > > It does not come for free, in some cases the register allocator does some > stupid things due to the hard register declaration. But it gets much better > code overall, so who cares about the microoptimization. > > Of course, if the register allocator did the right thing, or if I could use > simply > > unsigned char *ip __attribute__(__do_not_spill_me__(20))); > OOP *sp __attribute__(__do_not_spill_me__(10))); > intptr_t arg __attrbite__(__do_not_spill_me__(0))); > > that would be just fine.
Like if register unsigned char *ip; would increase spill cost of ip compared to unsigned char *ip; ? It is, after all, a cost issue - forcefully pinning down registers can lead to problems. We'd of course have to somehow "preserve" the register state of ip for all relevant pseudos (and avoid coalescing with non-register ones). Richard. > Paolo >