Hi Segher, Thanks for having a look at the patch.
On Thu, Jul 23, 2020 at 01:34:22PM -0500, Segher Boessenkool wrote: > Hi! > > On Thu, Jul 23, 2020 at 04:56:14PM +0100, Jozef Lawrynowicz wrote: > > +static int > > +msp430_insn_cost (rtx_insn *insn, bool speed ATTRIBUTE_UNUSED) > > +{ > > + int cost; > > + > > + if (recog_memoized (insn) < 0) > > + return 0; > > + > > + cost = get_attr_length (insn); > > + if (TARGET_DEBUG_INSN_COSTS) > > + { > > + fprintf (stderr, "cost %d for insn:\n", cost); > > + debug_rtx (insn); > > + } > > + > > + /* The returned cost must be relative to COSTS_N_INSNS (1). An insn with > > a > > + length of 2 bytes is the smallest possible size and so must be > > equivalent > > + to COSTS_N_INSNS (1). */ > > + return COSTS_N_INSNS (cost) / (2 * COSTS_N_INSNS (1)); > > This is the same as "cost / 2", so "length / 2" here, which doesn't look > right. The returned value should have the same "unit" as COSTS_N_INSNS > does, so maybe you want COSTS_N_INSNS (length / 2) ? Indeed it looks like I made a thinko in that calculation in TARGET_INSN_COSTS; trying to make it verbose to show the thought behind the calculation backfired :) Fixing it to return "COSTS_N_INSNS (length / 2)" actually made codesize noticeably worse for most of my benchmarks. I had to define BRANCH_COST to indicate branches are not cheap. In the original patch a cheap instruction would have a cost of 1. When using the default BRANCH_COST of 1 to calculate the cost of a branch compared to an insn (e.g. in ifcvt.c), BRANCH_COST would be wrapped in COSTS_N_INSNS, scaling the cost to 4, which suitably disparaged it vs the cheap insn cost of 1. With the fixed insn_cost calculation, a cheap instruction would cost 4 with the COSTS_N_INSNS scaling, and a branch would cost the same, which is not right. Fixed in the attached patch. > > > + /* FIXME Add more detailed costs when optimizing for speed. > > + For now the length of the instruction is a good approximiation and > > roughly > > + correlates with cycle cost. * > > COSTS_N_INSNS (1) is 4, so that you can make things cost 5, 6, 7 to be a > cost intermediate to COSTS_N_INSNS (1) and COSTS_N_INSNS (2). This is > very useful, scaling down the costs destroys that. > > > +mdebug-insn-costs > > +Target Report Mask(DEBUG_INSN_COSTS) > > +Print insns and their costs as calculated by TARGET_INSN_COSTS. > > It is already printed in the generated asm with -dp? Not sure if you > want more detail than that. > > '-dp' > Annotate the assembler output with a comment indicating which > pattern and alternative is used. The length and cost of each > instruction are also printed. > During development I found it useful to see the insns in RTL format and their costs alongside that. In hindsight, it doesn't really have much use in the finalized patch, so I've removed it. Thanks! Jozef > > Segher
>From 8b69c5a38006d30d001561d47f7ecbd9bd3ead78 Mon Sep 17 00:00:00 2001 From: Jozef Lawrynowicz <joze...@mittosystems.com> Date: Thu, 16 Jul 2020 11:34:50 +0100 Subject: [PATCH 4/5] MSP430: Implement TARGET_INSN_COST The length of an insn can be used to calculate its cost, when optimizing for size. When optimizing for speed, this is a good estimate, since the cycle cost of an MSP430 instruction increases with its length. gcc/ChangeLog: * config/msp430/msp430.c (TARGET_INSN_COST): Define. (msp430_insn_cost): New function. * config/msp430/msp430.opt: Add -mdebug-insn-costs option. * config/msp430/msp430.h (BRANCH_COST): Define. (LOGICAL_OP_NON_SHORT_CIRCUIT): Define. gcc/testsuite/ChangeLog: * gcc.target/msp430/rtx-cost-O3-default.c: New test. * gcc.target/msp430/rtx-cost-O3-f5series.c: New test. * gcc.target/msp430/rtx-cost-Os-default.c: New test. * gcc.target/msp430/rtx-cost-Os-f5series.c: New test. --- gcc/config/msp430/msp430.c | 25 ++++++++--- gcc/config/msp430/msp430.h | 8 ++++ .../gcc.target/msp430/rtx-cost-O3-default.c | 42 ++++++++++++++++++ .../gcc.target/msp430/rtx-cost-O3-f5series.c | 38 ++++++++++++++++ .../gcc.target/msp430/rtx-cost-Os-default.c | 43 +++++++++++++++++++ .../gcc.target/msp430/rtx-cost-Os-f5series.c | 38 ++++++++++++++++ 6 files changed, 189 insertions(+), 5 deletions(-) create mode 100644 gcc/testsuite/gcc.target/msp430/rtx-cost-O3-default.c create mode 100644 gcc/testsuite/gcc.target/msp430/rtx-cost-O3-f5series.c create mode 100644 gcc/testsuite/gcc.target/msp430/rtx-cost-Os-default.c create mode 100644 gcc/testsuite/gcc.target/msp430/rtx-cost-Os-f5series.c diff --git a/gcc/config/msp430/msp430.c b/gcc/config/msp430/msp430.c index b7daafcc11a..35ccd2817ca 100644 --- a/gcc/config/msp430/msp430.c +++ b/gcc/config/msp430/msp430.c @@ -1195,11 +1195,6 @@ msp430_memory_move_cost (machine_mode mode ATTRIBUTE_UNUSED, return 2 * cost; } -/* BRANCH_COST - Changing from the default of 1 doesn't affect code generation, presumably - because there are no conditional move insns - when a condition is involved, - the only option is to use a cbranch. */ - /* For X, which must be a MEM RTX, return TRUE if it is an indirect memory reference, @Rn or @Rn+. */ static bool @@ -1711,6 +1706,26 @@ msp430_rtx_costs (rtx x, return false; } } + +#undef TARGET_INSN_COST +#define TARGET_INSN_COST msp430_insn_cost + +static int +msp430_insn_cost (rtx_insn *insn, bool speed ATTRIBUTE_UNUSED) +{ + if (recog_memoized (insn) < 0) + return 0; + + /* The returned cost must be relative to COSTS_N_INSNS (1). An insn with a + length of 2 bytes is the smallest possible size and so must be equivalent + to COSTS_N_INSNS (1). */ + return COSTS_N_INSNS (get_attr_length (insn) / 2); + + /* FIXME Add more detailed costs when optimizing for speed. + For now the length of the instruction is a good approximiation and roughly + correlates with cycle cost. */ +} + /* Function Entry and Exit */ diff --git a/gcc/config/msp430/msp430.h b/gcc/config/msp430/msp430.h index b813e825311..830101408a5 100644 --- a/gcc/config/msp430/msp430.h +++ b/gcc/config/msp430/msp430.h @@ -245,6 +245,14 @@ extern const char *msp430_get_linker_devices_include_path (int, const char **); #define HAS_LONG_COND_BRANCH 0 #define HAS_LONG_UNCOND_BRANCH 0 +/* The cost of a branch sequence is roughly 3 "cheap" instructions. */ +#define BRANCH_COST(speed_p, predictable_p) 3 + +/* Override the default BRANCH_COST heuristic to indicate that it is preferable + to retain short-circuit operations, this results in significantly better + codesize and performance. */ +#define LOGICAL_OP_NON_SHORT_CIRCUIT 0 + #define LOAD_EXTEND_OP(M) ZERO_EXTEND #define WORD_REGISTER_OPERATIONS 1 diff --git a/gcc/testsuite/gcc.target/msp430/rtx-cost-O3-default.c b/gcc/testsuite/gcc.target/msp430/rtx-cost-O3-default.c new file mode 100644 index 00000000000..1bd6a142002 --- /dev/null +++ b/gcc/testsuite/gcc.target/msp430/rtx-cost-O3-default.c @@ -0,0 +1,42 @@ +/* { dg-do compile } */ +/* { dg-options "-O3" } */ +/* { dg-final { check-function-bodies "**" "" } } */ + +/* Verify the MSP430 cost model is working as expected for the default ISA + (msp430x) and hwmult (none), when compiling at -O3. */ + +char arr[2]; +char a; +char *ptr; + +/* +** foo: +** ... +** MOV.B \&a, \&arr\+1 +** MOV.* #arr\+2, \&ptr +** ... +*/ + +void +foo (void) +{ + arr[1] = a; + ptr = arr + 2; +} + +extern void ext (void); + +/* +** bar: +** ... +** CALL.* #ext +** CALL.* #ext +** ... +*/ + +void +bar (void) +{ + ext (); + ext (); +} diff --git a/gcc/testsuite/gcc.target/msp430/rtx-cost-O3-f5series.c b/gcc/testsuite/gcc.target/msp430/rtx-cost-O3-f5series.c new file mode 100644 index 00000000000..1e48625f2e5 --- /dev/null +++ b/gcc/testsuite/gcc.target/msp430/rtx-cost-O3-f5series.c @@ -0,0 +1,38 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -mhwmult=f5series" } */ +/* { dg-final { check-function-bodies "**" "" } } */ + +/* Verify the MSP430 cost model is working as expected for the default ISA + (msp430x) and f5series hwmult, when compiling at -O3. */ + +volatile unsigned long a; +volatile unsigned int b; +volatile unsigned long c; +unsigned long res1; +unsigned long res2; +unsigned long res3; + +/* +** foo: +** ... +** MOV.B #16, R14 +** CALL.* #__mspabi_slll +** ... +** MOV.* \&res2.* +** ... +** RLA.*RLC.* +** ... +** MOV.* \&res3.* +** ... +** RLA.*RLC.* +** ... +*/ +void foo (void) +{ + /* Use the shift library function for this. */ + res1 = (a << 16) | b; + /* Emit 7 inline shifts for this. */ + res2 *= 128; + /* Perform this multiplication inline, using addition and shifts. */ + res3 *= 100; +} diff --git a/gcc/testsuite/gcc.target/msp430/rtx-cost-Os-default.c b/gcc/testsuite/gcc.target/msp430/rtx-cost-Os-default.c new file mode 100644 index 00000000000..8f3d1b28049 --- /dev/null +++ b/gcc/testsuite/gcc.target/msp430/rtx-cost-Os-default.c @@ -0,0 +1,43 @@ +/* { dg-do compile } */ +/* { dg-options "-Os" } */ +/* { dg-final { check-function-bodies "**" "" } } */ + +/* Verify the MSP430 cost model is working as expected for the default ISA + (msp430x) and hwmult (none), when compiling at -Os. */ + +char arr[2]; +char a; +char *ptr; + +/* +** foo: +** ... +** MOV.B \&a, \&arr\+1 +** MOV.* #arr\+2, \&ptr +** ... +*/ + +void +foo (void) +{ + arr[1] = a; + ptr = arr + 2; +} + +extern void ext (void); + +/* +** bar: +** ... +** MOV.* #ext, R10 +** CALL.* R10 +** CALL.* R10 +** ... +*/ + +void +bar (void) +{ + ext (); + ext (); +} diff --git a/gcc/testsuite/gcc.target/msp430/rtx-cost-Os-f5series.c b/gcc/testsuite/gcc.target/msp430/rtx-cost-Os-f5series.c new file mode 100644 index 00000000000..bb37f9083d9 --- /dev/null +++ b/gcc/testsuite/gcc.target/msp430/rtx-cost-Os-f5series.c @@ -0,0 +1,38 @@ +/* { dg-do compile } */ +/* { dg-options "-Os -mhwmult=f5series" } */ +/* { dg-final { check-function-bodies "**" "" } } */ + +/* Verify the MSP430 cost model is working as expected for the default ISA + (msp430x) and f5series hwmult, when compiling at -Os. */ + +volatile unsigned long a; +volatile unsigned int b; +volatile unsigned long c; +unsigned long res1; +unsigned long res2; +unsigned long res3; + +/* +** foo: +** ... +** MOV.B #16, R14 +** CALL.* #__mspabi_slll +** ... +** MOV.B #7, R14 +** CALL.* #__mspabi_slll +** ... +** MOV.B #100, R14 +** MOV.B #0, R15 +** ... +** CALL.* #__mulsi2_f5 +** ... +*/ +void foo (void) +{ + /* Use the shift library function for this. */ + res1 = (a << 16) | b; + /* Likewise. */ + res2 *= 128; + /* Use the hardware multiply library function for this. */ + res3 *= 100; +} -- 2.27.0