On Sat, Aug 6, 2011 at 5:00 PM, Paolo Bonzini <bonz...@gnu.org> wrote:
> On 08/04/2011 01:10 PM, Andrew Haley wrote:
>>>>
>>>> >>  It's the sort of thing that gets done in threaded interpreters,
>>>> >>  where you really need to keep a few pointers in registers and
>>>> >>  the interpreter itself is a very long function.  gcc has always
>>>> >>  done a dreadful job of register allocation in such cases.
>>>
>>> >
>>> >  Sure, but what I have seen people use global register variables
>>> >  for this (which means they get taken away from the register
>>> > allocator).
>>
>> Not always though, and the x86 has so few registers that using a
>> global register variable is very problematic.  I suppose you could
>> compile the threaded interpreter in a file of its own, but I'm not
>> sure that has quite the same semantics as local register variables.
>
> Indeed, local register variables give almost the same benefit as globals
> with half the burden.  The idea is that you don't care about the exact
> register that holds the contents but, by specifying a callee-save register,
> GCC will use those instead of memory across calls.  This reduces _a lot_ the
> number of spills.
>
>> The problem is that people who care about this stuff very much don't
>> always read...@gcc.gnu.org  so won't be heard.  But in their own world
>> (LISP, Forth) nice features like register variables and labels as
>> values have led to gcc being the preferred compiler for this kind of
>> work.
>
> /me raises hands.
>
> For GNU Smalltalk, using
>
> #if defined(__i386__)
> # define __DECL_REG1 __asm("%esi")
> # define __DECL_REG2 __asm("%edi")
> # define __DECL_REG3 /* no more caller-save regs if PIC is in use!  */
> #endif
>
> #if defined(__x86_64__)
> # define __DECL_REG1 __asm("%r12")
> # define __DECL_REG2 __asm("%r13")
> # define __DECL_REG3 __asm("%rbx")
> #endif
>
> ...
>
>  register unsigned char *ip __DECL_REG1;
>  register OOP * sp __DECL_REG2;
>  register intptr_t arg __DECL_REG3;
>
> improves performance by up to 20% if I remember correctly.  I can benchmark
> it if desired.
>
> It does not come for free, in some cases the register allocator does some
> stupid things due to the hard register declaration.  But it gets much better
> code overall, so who cares about the microoptimization.
>
> Of course, if the register allocator did the right thing, or if I could use
> simply
>
>  unsigned char *ip __attribute__(__do_not_spill_me__(20)));
>  OOP *sp __attribute__(__do_not_spill_me__(10)));
>  intptr_t arg __attrbite__(__do_not_spill_me__(0)));
>
> that would be just fine.

Like if

register unsigned char *ip;

would increase spill cost of ip compared to

unsigned char *ip;

?  It is, after all, a cost issue - forcefully pinning down registers can
lead to problems.  We'd of course have to somehow "preserve" the
register state of ip for all relevant pseudos (and avoid coalescing with
non-register ones).

Richard.

> Paolo
>

Reply via email to