Calculating instruction costs

2013-07-09 Thread David Given
I'm working on a gcc backend for an architecture. The architecture has
instructions for indexed array access; so, ld r0, (r1, r2) is equivalent
to r0 = r1[r2] where r1 is a int32_t*.

I'm representing this in the .md file with the following pattern:

(define_insn "*si_load_indexed"
  [
(set
  (match_operand:SI 0 "register_operand" "=r")
  (mem:SI
(plus:SI
  (mult:SI
(match_operand:SI 1 "register_operand" "%r")
(const_int 4))
  (match_operand:SI 2 "register_operand" "r"
  ]
  ""
  "ld %0, (%2, %1)"
  [(set_attr "length" "4")]
)

However, the instruction is never actually being emitted. Looking at the
debug output from the instruction combining stage, I see this:

Trying 8, 9 -> 10:
Successfully matched this instruction:
(set (reg:SI 47 [ *_5 ])
(mem:SI (plus:SI (mult:SI (reg/v:SI 43 [ b ])
(const_int 4 [0x4]))
(reg:SI 0 r0 [ a ])) [2 *_5+0 S4 A32]))
rejecting combination of insns 8, 9 and 10
original costs 8 + 4 + 4 = 16
replacement cost 32

Instructions 8, 9 and 10 are:

(insn 8 5 9 2 (set (reg:SI 45)
(ashift:SI (reg/v:SI 43 [ b ])
(const_int 2 [0x2]))) test.c:5 15 {ashlsi3}
 (expr_list:REG_DEAD (reg/v:SI 43 [ b ])
(nil)))
(insn 9 8 10 2 (set (reg/f:SI 46)
(plus:SI (reg/v/f:SI 42 [ a ])
(reg:SI 45))) test.c:5 13 {addsi3}
 (expr_list:REG_DEAD (reg:SI 45)
(expr_list:REG_DEAD (reg/v/f:SI 42 [ a ])
(nil
(insn 10 9 15 2 (set (reg:SI 47 [ *_5 ])
(mem:SI (reg/f:SI 46) [2 *_5+0 S4 A32])) test.c:5 6 {*si_load}
 (expr_list:REG_DEAD (reg/f:SI 46)
(nil)))

If I've read this correctly, it indicates that the instruction pattern
has been matched, but the instruction has been rejected due to being
more expensive than the original instructions.

So, how is it calculating the cost of my instruction? Where's it getting
that 32 from (which seems weirdly high)?

Right now all the cost macros are left as the default, which is probably
the root of the problem; but I'm having a lot of trouble getting my head
around them. In the interest of actually getting something to work, are
there any ways of using a simplified cost model where the cost of each
instruction is specified manually in the instruction pattern alongside
the length? (Or even just *using* the length as the cost...)

-- 
┌─── dg@cowlark.com ─ http://www.cowlark.com ─
│ "USER'S MANUAL VERSION 1.0:  The information presented in this
│ publication has been carefully for reliability." --- anonymous
│ computer hardware manual



signature.asc
Description: OpenPGP digital signature


HAVE_ATTR_enabled mishandling?

2013-07-09 Thread David Given
I think I have found a bug. This is in stock gcc 4.8.1...

My backend does not use the 'enabled' attribute; therefore the following
code in insn-attr.h kicks in:

  #ifndef HAVE_ATTR_enabled
  #define HAVE_ATTR_enabled 0
  #endif

Therefore the following code in gcc/lra-constraints.c is enabled:

  #ifdef HAVE_ATTR_enabled
  if (curr_id->alternative_enabled_p != NULL
  && ! curr_id->alternative_enabled_p[nalt])
continue;
  #endif

->alternative_enabled_p is bogus; therefore segfault.

Elsewhere I see structures of the form:

  #if HAVE_ATTR_enabled
  ...
  #endif

So I think that #ifdef above is a straight typo. Certainly, changing it
to a #if makes the crash go away...

-- 
┌─── dg@cowlark.com ─ http://www.cowlark.com ─
│ "Every planet is weird. I spent six weeks on a moon where the
│ principal form of recreation was juggling geese. Baby geese. Goslings.
│ They were juggled." --- Firefly, _Our Mrs. Reynolds_



signature.asc
Description: OpenPGP digital signature


Re: Calculating instruction costs

2013-07-10 Thread David Given
Michael Matz wrote:
[...]
> As you didn't adjust any cost I would guess the high value comes from the 
> default implementation of address_cost, which simply uses arithmetic cost, 
> and the MULT in there is quite expensive by default.
> 
> See TARGET_ADDRESS_COST in several ports.

Oddly, TARGET_ADDRESS_COST is never being called for my port, but yes,
my not having implemented any costing appears to be fundamentally the issue.

After having done a bunch of reading up on how costing works, and
deciphering the rather cryptic other ports, my understanding is:

Costing is based entirely on analysis of the RTL, and is completely
irrelevant of what insns are selected. Therefore if my backend wants to
support certain optimised addressing modes, I need to insert code into
my TARGET_RTX_COSTS hook that looks for mem constructions which can be
represented by such addressing modes, and encourages the compiler to
select them by giving them a low cost. I don't get any assistance from
the patterns in the .md file.

Have I got that right?

-- 
┌─── dg@cowlark.com ─ http://www.cowlark.com ─
│ "USER'S MANUAL VERSION 1.0:  The information presented in this
│ publication has been carefully for reliability." --- anonymous
│ computer hardware manual



signature.asc
Description: OpenPGP digital signature


mach pass deleting instructions?

2013-07-22 Thread David Given
So I'm trying to get compare-and-branch working on my architecture. I
have the following patterns:

(define_expand "cbranchsf4"
  [(set
  (reg:CC CC_REGNO)
  (compare:CC
(match_operand:SF 1 "register_operand")
(match_operand:SF 2 "register_operand")))
   (set
  (pc)
  (if_then_else
(match_operator 0 "comparison_operator"
  [(reg:CC CC_REGNO)
   (const_int 0)]
)
(label_ref
  (match_operand 3 "" ""))
(pc))
   )]
  ""
  {}
)

(define_insn "*test_sf"
  [(set
   (reg:CC CC_REGNO)
   (compare
 (match_operand:SF 0 "register_operand" "r")
 (match_operand:SF 1 "register_operand" "r")))]
  ""
  "fcmp %0, %1, %1"
  [(set_attr "length" "4")]
)

(define_insn "*branch_"
  [(set
 (pc)
 (if_then_else
   (condition
 (reg:CC CC_REGNO)
 (const_int 0))
   (label_ref
 (match_operand 0))
   (pc)))]
  ""
  "b %0"
  [(set_attr "length" "4")]
)

The architecture is utterly traditional and the code above is stolen
pretty much intact from the moxie port (which I'm using as a reference
because it seems to be simple and easy to understand).

When I actually try to build stuff, however, the branch gets emitted but
then silently deleted during the mach pass. The debug tracing (as
produced by -da) doesn't say why; it just removes it. Naturally the
resulting program doesn't work. Example:

int cmp(float a, float b)
{ return a>b; }

->

cmp:
  push r6, lr
  mov r6, #1 ; fast
  fcmp r0, r1, r1
  <--- branch instruction to .L2 should be here
  mov r6, #0 ; fast
.L2:
  mov r0, r6 ; fast
  pop r6, pc

Does anyone have any suggestions as to what I'm doing wrong, and where
to start looking? For example, what is the mach pass actually trying to
do, and is there any way to get it to give me more information about why
it's doing it?

-- 
┌─── dg@cowlark.com ─ http://www.cowlark.com ─
│ "USER'S MANUAL VERSION 1.0:  The information presented in this
│ publication has been carefully for reliability." --- anonymous
│ computer hardware manual



signature.asc
Description: OpenPGP digital signature


Re: Strange optimization in GCC 4.7.2

2013-07-23 Thread David Given
Konstantin Vladimirov wrote:
[...]
> x = (y & ~(1 << 7)) | (((value >> 9) & 1) << 7);
[...]
> x = y & 4294967167 | (value >> 9) << 7 & 255; <- WAT?


   ((value >> 9) & 1) << 7
== ((value >> 9) << 7) & (1 << 7)
== ((value >> 9) << 7) & 0x80
== ((value >> 9) << 7) & 0xff

...I think.

That last step is probably being done because anding with 0xff is really
cheap on x86 (you just pick the appropriate subreg --- al instead of
eax, for example).

-- 
┌─── dg@cowlark.com ─ http://www.cowlark.com ─
│ "USER'S MANUAL VERSION 1.0:  The information presented in this
│ publication has been carefully for reliability." --- anonymous
│ computer hardware manual



signature.asc
Description: OpenPGP digital signature


Re: mach pass deleting instructions?

2013-07-23 Thread David Given
David Given wrote:
[...]
> When I actually try to build stuff, however, the branch gets emitted but
> then silently deleted during the mach pass.

Solved: turned out to be old code in the TARGET_MACHINE_DEPENDENT_REORG,
dating from the port I was basing my backend on, which was mangling my
code. I disabled the target hook and it all works now.

-- 
┌─── dg@cowlark.com ─ http://www.cowlark.com ─
│ "USER'S MANUAL VERSION 1.0:  The information presented in this
│ publication has been carefully for reliability." --- anonymous
│ computer hardware manual



signature.asc
Description: OpenPGP digital signature


Problems with register elimination

2013-07-27 Thread David Given
I am having a great deal of trouble getting register elimination (and
stack frame layouts in general) working properly on my architecture.
There is some fundamental issue I'm simply not getting here.

My architecture is a fairly vanilla RISC system with a link pointer. The
stack frame layout I'm aiming for looks like this:

  hi   incoming_params
   == ap
   callee_saves
   --
   local_vars
   local_vars_padding
   -- fp
   outgoing_params
  lo   -- sp

The docs says that because I don't know where the locals are until I
know how big callee_saves is, I have to use the following setup:

#define STACK_POINTER_REGNUM SP_REG
#define FRAME_POINTER_REGNUM FP_REG /* virtual frame pointer */
#define HARD_FRAME_POINTER_REGNUM R6_REG /* real frame pointer */
#define ARG_POINTER_REGNUM AP_REG /* virtual argument pointer */

AP_REG and FP_REG are fake registers (values 27 and 28 respectively;
different from R6_REG and SP_REG). These get eliminated into either the
stack or r6 as follows:

#define ELIMINABLE_REGS \
{{ ARG_POINTER_REGNUM, STACK_POINTER_REGNUM }, \
 { ARG_POINTER_REGNUM, HARD_FRAME_POINTER_REGNUM }, \
 { FRAME_POINTER_REGNUM, STACK_POINTER_REGNUM }, \
 { FRAME_POINTER_REGNUM, HARD_FRAME_POINTER_REGNUM }}

This is all utterly standard, and mostly stolen from other ports...
except I can't make it work, in various weird ways.

Weirdness (1): I never see ARG_POINTER_REGNUM used to access arguments.
The compiler seems to want to access function arguments via
FRAME_POINTER_REGNUM plus a small value, which means they overlap the
locals. (It's definitely using the same numeric ranges. It looks like
it's trying to use ARG_POINTER_REGNUM but is getting the wrong register.)

Weirdness (2): the following test function generates code with tries to
copy AP_REG into a register without eliminating it.

void* return_local(void) { int i; return &i; }

It turns into the following RTL:

(insn 17 2 12 2 (set (reg/i:SI 0 r0)
(reg:SI 27 ?ap)) test.c:14 4
 (nil))
(insn 12 17 15 2 (use (reg/i:SI 0 r0)) test.c:14 -1
 (nil))

Why isn't elimination happening in this situation? And why is AP_REG
being used here at all?

I've been looking at the various backends, but they're not very helpful
--- they're all rather different, and I can't see anything they're doing
which I'm not or vice versa.

However, I am particularly perturbed by the following comment from the
MCore port:

/* Note that the name `fp' is horribly misleading since `fp' is in fact
   only the argument-and-return-context pointer. */

I don't know whether this is just talking about the MCore, or gcc in
general --- I find it interesting that most backends which use a fake
frame pointer seem to end up with FRAME_POINTER_REGNO and
HARD_FRAME_POINTER_REGNO pointing at different addresses.

If anyone can offer any suggestions as to what I'm doing wrong --- or,
better still, point me at more in-depth reading on how all this is
supposed to work!

-- 
┌─── dg@cowlark.com ─ http://www.cowlark.com ─
│ "Every planet is weird. I spent six weeks on a moon where the
│ principal form of recreation was juggling geese. Baby geese. Goslings.
│ They were juggled." --- Firefly, _Our Mrs. Reynolds_



signature.asc
Description: OpenPGP digital signature


Re: converting rtx object to the assembly instruction.

2013-08-15 Thread David Given
David Malcolm wrote:
[...]
> Out of interest, how portable is open_memstream (and if not, is there a
> good portable way of doing this)?  I have to do similar things in my
> python plugin for GCC, and currently I'm using fmemopen.  IIRC that
> latter one is not available on OS X, and was one of the biggest issues
> last time I tried to get it working there.

It seems to be Posix, although fairly recent (2008, according to the man
page), although glibc looks like it's had it for ages. The interwebs
suggest that OSX doesn't have it, which is a shame, as it looks dead handy.

-- 
┌─── dg@cowlark.com ─ http://www.cowlark.com ─
│ "USER'S MANUAL VERSION 1.0:  The information presented in this
│ publication has been carefully for reliability." --- anonymous
│ computer hardware manual



signature.asc
Description: OpenPGP digital signature


Re: gnu software bugs - long double

2013-11-02 Thread David Given
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 02/11/13 19:48, Mischa Baars wrote:
[...]
> I have written a couple of new trigonometric functions for use in
> the library, and actually I need this to function properly.

The point is that 1.1 simply cannot be represented precisely as a IEEE
floating point number, for precisely the same reasons that 1/3 cannot
be represented precisely as a decimal number (1....). This is
intrinsic to the way that floating point numbers work. If you try,
you'll get the closest number that IEEE floats *can* represent.

If you really need a completely precise representation of 1.1, then
you're not going to be able to use IEEE floats --- you'll have to use
decimals or some sort of fractional representation instead. I don't
know if gcc can help you with those, but there are endless helper
libraries that will do both for you. They're usually pretty slow, though.

- -- 
 ?? ? http://www.cowlark.com ?
? "There does not now, nor will there ever, exist a programming
? language in which it is the least bit hard to write bad programs." ---
? Flon's Axiom
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.12 (GNU/Linux)

iD8DBQFSdVxhf9E0noFvlzgRAjTFAJ4+UO9b60TkX+/lOa8C/5Hs/XMT3QCcCm8u
k2FWyohiL0rQtLUtotFkS/Q=
=LgFz
-END PGP SIGNATURE-