Re: limiting call clobbered registers for library functions

Paul Shortis Mon, 02 Feb 2015 13:56:35 -0800

On 02/02/15 18:55, Yury Gribov wrote:

On 01/30/2015 11:16 AM, Matthew Fortune wrote:
Yury Gribov <y.gri...@samsung.com> writes:
On 01/29/2015 08:32 PM, Richard Henderson wrote:
On 01/29/2015 02:08 AM, Paul Shortis wrote:
I've ported GCC to a small 16 bit CPU that has single bitshifts. SoI've handled variable / multi-bit shifts using a mix ofinline shifts
and calls to assembler support functions.
The calls to the asm library functions clobber only one (byconst) or
two
(variable) registers but of course calling these functionscauses allof the standard call clobbered registers to be consideredclobbered,thus wasting lots of candidate registers for use inexpressionssurrounding these shifts and causing unnecessary registersaves in
the surrounding function prologue/epilogue.
I've scrutinized and cloned the actions of other ports thatdo thesame, however I'm unable to convince the various passesthat only r1
and r2 can be clobbered by these library calls.
Is anyone able to point me in the proper direction for asolution to
this problem ?
You wind up writing a pattern that contains a call, but isn't
represented in rtl as a call.
Could it be useful to provide a pragma for specifyingfunction registerusage? This would allow e.g. library writer to write ahand-optimizedassembly version and then inform compiler of it's binaryinterface.
Currently a surrogate of this can be achieved by puttinginline asm codein static inline functions in public library headers but thishas it's
own disadvantages (e.g. code bloat).
This sounds like a good idea in principle. I seem to recallseeing somethingsimilar to this in other compiler frameworks that allow anumber of specialcalling conventions to be defined and enable functions to beattributed to useone of them. I.e. not quite so general as specifying anarbitrary clobber list
but some sensible pre-defined alternative conventions.
FYI a colleague from kernel mentioned that they already achievethis by wrapping the actual call with inline asm e.g.
static inline int foo(int x) {
  asm(
    ".global foo_core\n"
    // foo_core accepts single parameter in %rax,
    // returns result in %rax and
    // clobbers %rbx
    "call foo_core\n"
    : "+a"(x)
    :
    : "rbx"
  );
  return x;
}
We still can't mark inline asm with things like__attribute__((pure)), etc. though so it's not an ideal solution.
-Y

Thanks everyone.

I've finally settled on an extension of the solution offered byRichard(from the SH port)


I've had to ...

    1.    write an expander that expands ...

a) short (bit count) constant shifts to aninstruction pattern that emits asm for one or more inline shiftsb) other constant shifts to another instructionpattern that emits asm for a libary call and clobbers ccc) variable shifts to another instruction patternthat emits asm for a libary call and clobbers ccplus r2

then for compare elimination two other instruction patterscorresponding to b) and c) that set CC from a compare instead ofclobbering it.

I could have avoided the expander and used a single instructionpattern for a)b)c) if if could have found a way to havealternative dependent clobbers in an instruction pattern. Iinvestigated attributes but couldn't see how I would be able toachieve what I needed. Also tried clobber (match_dup 2) but whenone of the alternatives has a constant for operands[2] theclobber is accepted silently by the .md compiler but doesn'tactually clobber the non-constant alternatives.

A mechanism to implement alternative dependent clobbers wouldhave allowed all this to be represented in a much more succinctand compact manner.

On another note... compare elimination didn't work for pattern c)and on inspection I found this snippet in compare-elim.c


static bool
arithmetic_flags_clobber_p (rtx insn)
{
...
...
  if (GET_CODE (pat) == PARALLEL && XVECLEN (pat, 0) ==2)
    {

which of course rejects any parallel pattern with a main rtx andmore than one clobber (i.e. (c) above). So I changed thisfunction so that it accepts patterns where rtx's one and upwardsare all clobbers, one being cc. The resulting generated asm ...

ld r2,r4 ; r2 holds the shiftcount, it will be clobbered to calculate an index into thesequence of shift instructions

        call    __ashl_v            ; variable ashl
        beq     .L4                    ; branch on result
        ld      r1,r3

If anyone can see any fault in this change please call out

Of course, the problem with using inline asm is that you have toarrange for them to be included in EVERY compile and they don'tallow compare elimination to be easily implemented.


Paul.

Re: limiting call clobbered registers for library functions

Reply via email to