Calculating instruction costs
I'm working on a gcc backend for an architecture. The architecture has instructions for indexed array access; so, ld r0, (r1, r2) is equivalent to r0 = r1[r2] where r1 is a int32_t*. I'm representing this in the .md file with the following pattern: (define_insn "*si_load_indexed" [ (set (match_operand:SI 0 "register_operand" "=r") (mem:SI (plus:SI (mult:SI (match_operand:SI 1 "register_operand" "%r") (const_int 4)) (match_operand:SI 2 "register_operand" "r" ] "" "ld %0, (%2, %1)" [(set_attr "length" "4")] ) However, the instruction is never actually being emitted. Looking at the debug output from the instruction combining stage, I see this: Trying 8, 9 -> 10: Successfully matched this instruction: (set (reg:SI 47 [ *_5 ]) (mem:SI (plus:SI (mult:SI (reg/v:SI 43 [ b ]) (const_int 4 [0x4])) (reg:SI 0 r0 [ a ])) [2 *_5+0 S4 A32])) rejecting combination of insns 8, 9 and 10 original costs 8 + 4 + 4 = 16 replacement cost 32 Instructions 8, 9 and 10 are: (insn 8 5 9 2 (set (reg:SI 45) (ashift:SI (reg/v:SI 43 [ b ]) (const_int 2 [0x2]))) test.c:5 15 {ashlsi3} (expr_list:REG_DEAD (reg/v:SI 43 [ b ]) (nil))) (insn 9 8 10 2 (set (reg/f:SI 46) (plus:SI (reg/v/f:SI 42 [ a ]) (reg:SI 45))) test.c:5 13 {addsi3} (expr_list:REG_DEAD (reg:SI 45) (expr_list:REG_DEAD (reg/v/f:SI 42 [ a ]) (nil (insn 10 9 15 2 (set (reg:SI 47 [ *_5 ]) (mem:SI (reg/f:SI 46) [2 *_5+0 S4 A32])) test.c:5 6 {*si_load} (expr_list:REG_DEAD (reg/f:SI 46) (nil))) If I've read this correctly, it indicates that the instruction pattern has been matched, but the instruction has been rejected due to being more expensive than the original instructions. So, how is it calculating the cost of my instruction? Where's it getting that 32 from (which seems weirdly high)? Right now all the cost macros are left as the default, which is probably the root of the problem; but I'm having a lot of trouble getting my head around them. In the interest of actually getting something to work, are there any ways of using a simplified cost model where the cost of each instruction is specified manually in the instruction pattern alongside the length? (Or even just *using* the length as the cost...) -- ┌─── dg@cowlark.com ─ http://www.cowlark.com ─ │ "USER'S MANUAL VERSION 1.0: The information presented in this │ publication has been carefully for reliability." --- anonymous │ computer hardware manual signature.asc Description: OpenPGP digital signature
HAVE_ATTR_enabled mishandling?
I think I have found a bug. This is in stock gcc 4.8.1... My backend does not use the 'enabled' attribute; therefore the following code in insn-attr.h kicks in: #ifndef HAVE_ATTR_enabled #define HAVE_ATTR_enabled 0 #endif Therefore the following code in gcc/lra-constraints.c is enabled: #ifdef HAVE_ATTR_enabled if (curr_id->alternative_enabled_p != NULL && ! curr_id->alternative_enabled_p[nalt]) continue; #endif ->alternative_enabled_p is bogus; therefore segfault. Elsewhere I see structures of the form: #if HAVE_ATTR_enabled ... #endif So I think that #ifdef above is a straight typo. Certainly, changing it to a #if makes the crash go away... -- ┌─── dg@cowlark.com ─ http://www.cowlark.com ─ │ "Every planet is weird. I spent six weeks on a moon where the │ principal form of recreation was juggling geese. Baby geese. Goslings. │ They were juggled." --- Firefly, _Our Mrs. Reynolds_ signature.asc Description: OpenPGP digital signature
Re: Calculating instruction costs
Michael Matz wrote: [...] > As you didn't adjust any cost I would guess the high value comes from the > default implementation of address_cost, which simply uses arithmetic cost, > and the MULT in there is quite expensive by default. > > See TARGET_ADDRESS_COST in several ports. Oddly, TARGET_ADDRESS_COST is never being called for my port, but yes, my not having implemented any costing appears to be fundamentally the issue. After having done a bunch of reading up on how costing works, and deciphering the rather cryptic other ports, my understanding is: Costing is based entirely on analysis of the RTL, and is completely irrelevant of what insns are selected. Therefore if my backend wants to support certain optimised addressing modes, I need to insert code into my TARGET_RTX_COSTS hook that looks for mem constructions which can be represented by such addressing modes, and encourages the compiler to select them by giving them a low cost. I don't get any assistance from the patterns in the .md file. Have I got that right? -- ┌─── dg@cowlark.com ─ http://www.cowlark.com ─ │ "USER'S MANUAL VERSION 1.0: The information presented in this │ publication has been carefully for reliability." --- anonymous │ computer hardware manual signature.asc Description: OpenPGP digital signature
mach pass deleting instructions?
So I'm trying to get compare-and-branch working on my architecture. I have the following patterns: (define_expand "cbranchsf4" [(set (reg:CC CC_REGNO) (compare:CC (match_operand:SF 1 "register_operand") (match_operand:SF 2 "register_operand"))) (set (pc) (if_then_else (match_operator 0 "comparison_operator" [(reg:CC CC_REGNO) (const_int 0)] ) (label_ref (match_operand 3 "" "")) (pc)) )] "" {} ) (define_insn "*test_sf" [(set (reg:CC CC_REGNO) (compare (match_operand:SF 0 "register_operand" "r") (match_operand:SF 1 "register_operand" "r")))] "" "fcmp %0, %1, %1" [(set_attr "length" "4")] ) (define_insn "*branch_" [(set (pc) (if_then_else (condition (reg:CC CC_REGNO) (const_int 0)) (label_ref (match_operand 0)) (pc)))] "" "b %0" [(set_attr "length" "4")] ) The architecture is utterly traditional and the code above is stolen pretty much intact from the moxie port (which I'm using as a reference because it seems to be simple and easy to understand). When I actually try to build stuff, however, the branch gets emitted but then silently deleted during the mach pass. The debug tracing (as produced by -da) doesn't say why; it just removes it. Naturally the resulting program doesn't work. Example: int cmp(float a, float b) { return a>b; } -> cmp: push r6, lr mov r6, #1 ; fast fcmp r0, r1, r1 <--- branch instruction to .L2 should be here mov r6, #0 ; fast .L2: mov r0, r6 ; fast pop r6, pc Does anyone have any suggestions as to what I'm doing wrong, and where to start looking? For example, what is the mach pass actually trying to do, and is there any way to get it to give me more information about why it's doing it? -- ┌─── dg@cowlark.com ─ http://www.cowlark.com ─ │ "USER'S MANUAL VERSION 1.0: The information presented in this │ publication has been carefully for reliability." --- anonymous │ computer hardware manual signature.asc Description: OpenPGP digital signature
Re: Strange optimization in GCC 4.7.2
Konstantin Vladimirov wrote: [...] > x = (y & ~(1 << 7)) | (((value >> 9) & 1) << 7); [...] > x = y & 4294967167 | (value >> 9) << 7 & 255; <- WAT? ((value >> 9) & 1) << 7 == ((value >> 9) << 7) & (1 << 7) == ((value >> 9) << 7) & 0x80 == ((value >> 9) << 7) & 0xff ...I think. That last step is probably being done because anding with 0xff is really cheap on x86 (you just pick the appropriate subreg --- al instead of eax, for example). -- ┌─── dg@cowlark.com ─ http://www.cowlark.com ─ │ "USER'S MANUAL VERSION 1.0: The information presented in this │ publication has been carefully for reliability." --- anonymous │ computer hardware manual signature.asc Description: OpenPGP digital signature
Re: mach pass deleting instructions?
David Given wrote: [...] > When I actually try to build stuff, however, the branch gets emitted but > then silently deleted during the mach pass. Solved: turned out to be old code in the TARGET_MACHINE_DEPENDENT_REORG, dating from the port I was basing my backend on, which was mangling my code. I disabled the target hook and it all works now. -- ┌─── dg@cowlark.com ─ http://www.cowlark.com ─ │ "USER'S MANUAL VERSION 1.0: The information presented in this │ publication has been carefully for reliability." --- anonymous │ computer hardware manual signature.asc Description: OpenPGP digital signature
Problems with register elimination
I am having a great deal of trouble getting register elimination (and stack frame layouts in general) working properly on my architecture. There is some fundamental issue I'm simply not getting here. My architecture is a fairly vanilla RISC system with a link pointer. The stack frame layout I'm aiming for looks like this: hi incoming_params == ap callee_saves -- local_vars local_vars_padding -- fp outgoing_params lo -- sp The docs says that because I don't know where the locals are until I know how big callee_saves is, I have to use the following setup: #define STACK_POINTER_REGNUM SP_REG #define FRAME_POINTER_REGNUM FP_REG /* virtual frame pointer */ #define HARD_FRAME_POINTER_REGNUM R6_REG /* real frame pointer */ #define ARG_POINTER_REGNUM AP_REG /* virtual argument pointer */ AP_REG and FP_REG are fake registers (values 27 and 28 respectively; different from R6_REG and SP_REG). These get eliminated into either the stack or r6 as follows: #define ELIMINABLE_REGS \ {{ ARG_POINTER_REGNUM, STACK_POINTER_REGNUM }, \ { ARG_POINTER_REGNUM, HARD_FRAME_POINTER_REGNUM }, \ { FRAME_POINTER_REGNUM, STACK_POINTER_REGNUM }, \ { FRAME_POINTER_REGNUM, HARD_FRAME_POINTER_REGNUM }} This is all utterly standard, and mostly stolen from other ports... except I can't make it work, in various weird ways. Weirdness (1): I never see ARG_POINTER_REGNUM used to access arguments. The compiler seems to want to access function arguments via FRAME_POINTER_REGNUM plus a small value, which means they overlap the locals. (It's definitely using the same numeric ranges. It looks like it's trying to use ARG_POINTER_REGNUM but is getting the wrong register.) Weirdness (2): the following test function generates code with tries to copy AP_REG into a register without eliminating it. void* return_local(void) { int i; return &i; } It turns into the following RTL: (insn 17 2 12 2 (set (reg/i:SI 0 r0) (reg:SI 27 ?ap)) test.c:14 4 (nil)) (insn 12 17 15 2 (use (reg/i:SI 0 r0)) test.c:14 -1 (nil)) Why isn't elimination happening in this situation? And why is AP_REG being used here at all? I've been looking at the various backends, but they're not very helpful --- they're all rather different, and I can't see anything they're doing which I'm not or vice versa. However, I am particularly perturbed by the following comment from the MCore port: /* Note that the name `fp' is horribly misleading since `fp' is in fact only the argument-and-return-context pointer. */ I don't know whether this is just talking about the MCore, or gcc in general --- I find it interesting that most backends which use a fake frame pointer seem to end up with FRAME_POINTER_REGNO and HARD_FRAME_POINTER_REGNO pointing at different addresses. If anyone can offer any suggestions as to what I'm doing wrong --- or, better still, point me at more in-depth reading on how all this is supposed to work! -- ┌─── dg@cowlark.com ─ http://www.cowlark.com ─ │ "Every planet is weird. I spent six weeks on a moon where the │ principal form of recreation was juggling geese. Baby geese. Goslings. │ They were juggled." --- Firefly, _Our Mrs. Reynolds_ signature.asc Description: OpenPGP digital signature
Re: converting rtx object to the assembly instruction.
David Malcolm wrote: [...] > Out of interest, how portable is open_memstream (and if not, is there a > good portable way of doing this)? I have to do similar things in my > python plugin for GCC, and currently I'm using fmemopen. IIRC that > latter one is not available on OS X, and was one of the biggest issues > last time I tried to get it working there. It seems to be Posix, although fairly recent (2008, according to the man page), although glibc looks like it's had it for ages. The interwebs suggest that OSX doesn't have it, which is a shame, as it looks dead handy. -- ┌─── dg@cowlark.com ─ http://www.cowlark.com ─ │ "USER'S MANUAL VERSION 1.0: The information presented in this │ publication has been carefully for reliability." --- anonymous │ computer hardware manual signature.asc Description: OpenPGP digital signature
Re: gnu software bugs - long double
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 02/11/13 19:48, Mischa Baars wrote: [...] > I have written a couple of new trigonometric functions for use in > the library, and actually I need this to function properly. The point is that 1.1 simply cannot be represented precisely as a IEEE floating point number, for precisely the same reasons that 1/3 cannot be represented precisely as a decimal number (1....). This is intrinsic to the way that floating point numbers work. If you try, you'll get the closest number that IEEE floats *can* represent. If you really need a completely precise representation of 1.1, then you're not going to be able to use IEEE floats --- you'll have to use decimals or some sort of fractional representation instead. I don't know if gcc can help you with those, but there are endless helper libraries that will do both for you. They're usually pretty slow, though. - -- ?? ? http://www.cowlark.com ? ? "There does not now, nor will there ever, exist a programming ? language in which it is the least bit hard to write bad programs." --- ? Flon's Axiom -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.12 (GNU/Linux) iD8DBQFSdVxhf9E0noFvlzgRAjTFAJ4+UO9b60TkX+/lOa8C/5Hs/XMT3QCcCm8u k2FWyohiL0rQtLUtotFkS/Q= =LgFz -END PGP SIGNATURE-