On 27.07.2017 14:29, Georg-Johann Lay wrote:
For some targets, the best place to put read-only lookup tables as
generated by -ftree-switch-conversion is not the generic address space
but some target specific address space.
This is the case for AVR, where for most devices .rodata must be
located in RAM.
Part #1 adds a new, optional target hook that queries the back-end
about the desired address space. There is currently only one user:
tree-switch-conversion.c::build_one_array() which re-builds value_type
and array_type if the address space returned by the backend is not
the generic one.
Part #2 is the AVR part that implements the new hook and adds some
sugar around it.
This is the AVR part.
It implements the new hook which returns a convenient flash address
space for devices where .rodata is located in RAM: The 16-bit __flash
for devices with <= 64 KiB flash and 24-bit __memx for > 64 KiB flash.
It adds a new option -madd-space-for-lookup= which allows to pick a
specific address space.
Some new insns and combine-split suport best code generation by the
knowledge that the 24-bit addresses will never point to RAM so that
the expensive decision-at-runtime whether LD or LPM has to be used
can be avoided.
Passed without new regressions on atmega128.
Ok for trunk provided the gcc part 1/2 is approved?
Johann
Implement TARGET_ADDR_SPACE_FOR_ARTIFICIAL_RODATA.
PR target/49857
* config/avr/avr-opts.h: New file.
* config/avr/avr.opt: Include it.
(-maddr-space-for-lookup=): New option and...
(avr_opt_addr_space_for_lookup): ...associated Var.
(avr_aspace_for_lookup): New option enums used by above.
* config/avr/avr-protos.h (avr_out_load_flashx): New proto.
* config/avr/avr.c (avr_out_load_flashx): New function.
* avr_adjust_insn_length [ADJUST_LEN_LOAD_FLASHX]: Handle it.
* avr_rtx_costs_1 [ZERO_EXTEND, SIGN_EXTEND]: Handle
shift-and-extend-by-1 of HI -> PSI.
[ASHIFT,PSImode]: Describe cost of extend-and-shift-by-1.
(TARGET_ADDR_SPACE_FOR_ARTIFICIAL_RODATA): Define to...
(avr_addr_space_for_artificial_rodata): ...this new static function.
* config/avr/avr.md (unspec): Add UNSPEC_LOAD_FLASHX.
(adjust_len): Add load_flashx.
(*ashiftpsi.1_sign_extend.hi, *ashiftpsi.1_zero_extend.hi)
(*extendpsi.ashift.1.uqi, *load<mode>-flashx): New insns.
(*split_xload<mode>-cswtch): New insn-and-split.
* doc/invoke.texi (AVR Options) <-maddr-space-for-lookup=>: Document.
Index: config/avr/avr-opts.h
===================================================================
--- config/avr/avr-opts.h (nonexistent)
+++ config/avr/avr-opts.h (working copy)
@@ -0,0 +1,40 @@
+/* Definitions for option handling for AVR.
+ Copyright (C) 2017 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+GNU General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+<http://www.gnu.org/licenses/>. */
+
+#ifndef AVR_OPTS_H
+#define AVR_OPTS_H
+
+enum avr_opt_addr_space
+ {
+ AVR_OPT_ADDR_SPACE_flash,
+ AVR_OPT_ADDR_SPACE_flash1,
+ AVR_OPT_ADDR_SPACE_flash2,
+ AVR_OPT_ADDR_SPACE_flash3,
+ AVR_OPT_ADDR_SPACE_flash4,
+ AVR_OPT_ADDR_SPACE_flash5,
+ AVR_OPT_ADDR_SPACE_memx,
+ AVR_OPT_ADDR_SPACE_generic
+ };
+
+#endif /* AVR_OPTS_H */
Index: config/avr/avr-protos.h
===================================================================
--- config/avr/avr-protos.h (revision 250302)
+++ config/avr/avr-protos.h (working copy)
@@ -94,6 +94,7 @@ extern const char* avr_out_plus (rtx, rt
extern const char* avr_out_round (rtx_insn *, rtx*, int* =NULL);
extern const char* avr_out_addto_sp (rtx*, int*);
extern const char* avr_out_xload (rtx_insn *, rtx*, int*);
+extern const char* avr_out_load_flashx (rtx_insn*, rtx*, int*);
extern const char* avr_out_movmem (rtx_insn *, rtx*, int*);
extern const char* avr_out_insert_bits (rtx*, int*);
extern bool avr_popcount_each_byte (rtx, int, int);
Index: config/avr/avr.c
===================================================================
--- config/avr/avr.c (revision 250302)
+++ config/avr/avr.c (working copy)
@@ -3914,6 +3914,31 @@ avr_out_xload (rtx_insn *insn ATTRIBUTE_
}
+/* Load register XOP[0] with flash content from PSI address XOP[1].
+ We have ELPMX, MOVW, load up to 4 bytes and may clobber Z. */
+
+const char*
+avr_out_load_flashx (rtx_insn*, rtx *xop, int *plen)
+{
+ unsigned n_bytes = GET_MODE_SIZE (GET_MODE (xop[0]));
+ gcc_assert (n_bytes <= 4);
+
+ avr_asm_len ("out __RAMPZ__,%C1", xop, plen, -1);
+ avr_asm_len ("movw ZL,%1", xop, plen, 1);
+
+ avr_asm_len ("elpm %A0,Z+", xop, plen, 1);
+ if (n_bytes >= 2) avr_asm_len ("elpm %B0,Z+", xop, plen, 1);
+ if (n_bytes >= 3) avr_asm_len ("elpm %C0,Z+", xop, plen, 1);
+ if (n_bytes >= 4) avr_asm_len ("elpm %D0,Z+", xop, plen, 1);
+
+ // EBI devices: reset RAMPZ to 0 so that we don't read garbage from RAM.
+ if (AVR_HAVE_RAMPD)
+ avr_asm_len ("out __RAMPZ__,__zero_reg__", xop, plen, 1);
+
+ return "";
+}
+
+
const char*
output_movqi (rtx_insn *insn, rtx operands[], int *plen)
{
@@ -9438,6 +9463,7 @@ avr_adjust_insn_length (rtx_insn *insn,
case ADJUST_LEN_MOV32: output_movsisf (insn, op, &len); break;
case ADJUST_LEN_MOVMEM: avr_out_movmem (insn, op, &len); break;
case ADJUST_LEN_XLOAD: avr_out_xload (insn, op, &len); break;
+ case ADJUST_LEN_LOAD_FLASHX: avr_out_load_flashx (insn, op, &len); break;
case ADJUST_LEN_SEXT: avr_out_sign_extend (insn, op, &len); break;
case ADJUST_LEN_SFRACT: avr_out_fract (insn, op, true, &len); break;
@@ -10837,6 +10863,12 @@ avr_rtx_costs_1 (rtx x, machine_mode mod
return true;
case ZERO_EXTEND:
+ if (PSImode == mode
+ && ASHIFT == GET_CODE (XEXP (x, 0)))
+ {
+ *total = COSTS_N_INSNS (GET_MODE_SIZE (mode));
+ return true;
+ }
*total = COSTS_N_INSNS (GET_MODE_SIZE (mode)
- GET_MODE_SIZE (GET_MODE (XEXP (x, 0))));
*total += avr_operand_rtx_cost (XEXP (x, 0), GET_MODE (XEXP (x, 0)),
@@ -10844,6 +10876,12 @@ avr_rtx_costs_1 (rtx x, machine_mode mod
return true;
case SIGN_EXTEND:
+ if (PSImode == mode
+ && ASHIFT == GET_CODE (XEXP (x, 0)))
+ {
+ *total = COSTS_N_INSNS (GET_MODE_SIZE (mode));
+ return true;
+ }
*total = COSTS_N_INSNS (GET_MODE_SIZE (mode) + 2
- GET_MODE_SIZE (GET_MODE (XEXP (x, 0))));
*total += avr_operand_rtx_cost (XEXP (x, 0), GET_MODE (XEXP (x, 0)),
@@ -11227,6 +11265,18 @@ avr_rtx_costs_1 (rtx x, machine_mode mod
break;
case PSImode:
+ if (SIGN_EXTEND == GET_CODE (XEXP (x, 0))
+ && const1_rtx == XEXP (x, 1))
+ {
+ *total = COSTS_N_INSNS (3);
+ return true;
+ }
+ if (ZERO_EXTEND == GET_CODE (XEXP (x, 0))
+ && const1_rtx == XEXP (x, 1))
+ {
+ *total = COSTS_N_INSNS (4);
+ return true;
+ }
if (!CONST_INT_P (XEXP (x, 1)))
{
*total = COSTS_N_INSNS (!speed ? 6 : 73);
@@ -13341,6 +13391,66 @@ avr_emit3_fix_outputs (rtx (*gen)(rtx,rt
}
+/* Implement `TARGET_ADDR_SPACE_FOR_ARTIFICIAL_RODATA'. */
+
+static addr_space_t
+avr_addr_space_for_artificial_rodata (tree node)
+{
+ if (TREE_CODE (node) == ARRAY_TYPE
+ && INTEGRAL_TYPE_P (TREE_TYPE (node))
+ && int_cst_value (TYPE_SIZE_UNIT (TREE_TYPE (node))) > 4)
+ {
+ // Due to reduced addressing mode and DImode being lowered to QI,
+ // avoid a total code bloat by leaving DImode in .rodata.
+ return (addr_space_t) ADDR_SPACE_GENERIC;
+ }
+
+ // Map option to address space proper... due to cumbersome include
+ // order we cannot use address spaces as option enum *sigh*.
+
+ int as = ADDR_SPACE_GENERIC;
+ switch (avr_opt_addr_space_for_lookup)
+ {
+ case AVR_OPT_ADDR_SPACE_memx: as = ADDR_SPACE_MEMX; break;
+ case AVR_OPT_ADDR_SPACE_flash: as = ADDR_SPACE_FLASH; break;
+ case AVR_OPT_ADDR_SPACE_flash1: as = ADDR_SPACE_FLASH1; break;
+ case AVR_OPT_ADDR_SPACE_flash2: as = ADDR_SPACE_FLASH2; break;
+ case AVR_OPT_ADDR_SPACE_flash3: as = ADDR_SPACE_FLASH3; break;
+ case AVR_OPT_ADDR_SPACE_flash4: as = ADDR_SPACE_FLASH4; break;
+ case AVR_OPT_ADDR_SPACE_flash5: as = ADDR_SPACE_FLASH5; break;
+ default:
+ as = ADDR_SPACE_GENERIC;
+ break;
+ }
+
+ if (as == ADDR_SPACE_GENERIC
+#if defined HAVE_LD_AVR_AVRXMEGA3_RODATA_IN_FLASH
+ // If the device has a linear address space and configure found out
+ // that .rodata resides in flash, use .rodata without extra penalty.
+ || avr_arch->flash_pm_offset != 0
+#endif
+ // Reduced Tiny has no address spaces, .rodata is just fine.
+ || AVR_TINY)
+ {
+ return (addr_space_t) ADDR_SPACE_GENERIC;
+ }
+
+ if (avr_n_flash > 1
+ && (AVR_HAVE_ELPM || AVR_HAVE_ELPMX))
+ {
+ if (as == ADDR_SPACE_MEMX
+ || (as == ADDR_SPACE_FLASH1 && avr_n_flash > 1)
+ || (as == ADDR_SPACE_FLASH2 && avr_n_flash > 2)
+ || (as == ADDR_SPACE_FLASH3 && avr_n_flash > 3)
+ || (as == ADDR_SPACE_FLASH4 && avr_n_flash > 4)
+ || (as == ADDR_SPACE_FLASH5 && avr_n_flash > 5))
+ return (addr_space_t) as;
+ }
+
+ return (addr_space_t) ADDR_SPACE_FLASH;
+}
+
+
/* Worker function for movmemhi expander.
XOP[0] Destination as MEM:BLK
XOP[1] Source " "
@@ -14762,6 +14872,10 @@ avr_fold_builtin (tree fndecl, int n_arg
#undef TARGET_ADDR_SPACE_DIAGNOSE_USAGE
#define TARGET_ADDR_SPACE_DIAGNOSE_USAGE avr_addr_space_diagnose_usage
+#undef TARGET_ADDR_SPACE_FOR_ARTIFICIAL_RODATA
+#define TARGET_ADDR_SPACE_FOR_ARTIFICIAL_RODATA \
+ avr_addr_space_for_artificial_rodata
+
#undef TARGET_MODE_DEPENDENT_ADDRESS_P
#define TARGET_MODE_DEPENDENT_ADDRESS_P avr_mode_dependent_address_p
Index: config/avr/avr.md
===================================================================
--- config/avr/avr.md (revision 250302)
+++ config/avr/avr.md (working copy)
@@ -78,6 +78,7 @@ (define_c_enum "unspec"
UNSPEC_COPYSIGN
UNSPEC_IDENTITY
UNSPEC_INSERT_BITS
+ UNSPEC_LOAD_FLASHX
UNSPEC_ROUND
])
@@ -158,7 +159,7 @@ (define_attr "adjust_len"
tsthi, tstpsi, tstsi, compare, compare64, call,
mov8, mov16, mov24, mov32, reload_in16, reload_in24, reload_in32,
ufract, sfract, round,
- xload, movmem,
+ xload, load_flashx, movmem,
ashlqi, ashrqi, lshrqi,
ashlhi, ashrhi, lshrhi,
ashlsi, ashrsi, lshrsi,
@@ -1405,6 +1406,99 @@ (define_insn "*addsi3_zero_extend.hi"
[(set_attr "length" "4")
(set_attr "cc" "set_n")])
+(define_insn "*ashiftpsi.1_sign_extend.hi"
+ [(set (match_operand:PSI 0 "register_operand" "=r")
+ (ashift:PSI (sign_extend:PSI (match_operand:HI 1 "register_operand" "0"))
+ (const_int 1)))]
+ ""
+ "lsl %A0\;rol %B0\;sbc %C0,%C0"
+ [(set_attr "length" "3")
+ (set_attr "cc" "clobber")])
+
+(define_insn "*ashiftpsi.1_zero_extend.hi"
+ [(set (match_operand:PSI 0 "register_operand" "=r")
+ (ashift:PSI (zero_extend:PSI (match_operand:HI 1 "register_operand" "0"))
+ (const_int 1)))]
+ ""
+ "lsl %A0\;rol %B0\;clr %C0\;rol %C0"
+ [(set_attr "length" "4")
+ (set_attr "cc" "clobber")])
+
+(define_split ; non-matching
+ [(set (match_operand:PSI 0 "register_operand")
+ (plus:PSI (sign_extend:PSI (ashift:HI (match_operand:HI 1 "register_operand")
+ (const_int 1)))
+ (match_operand:PSI 2 "symbol_ref_operand")))]
+ "!reload_completed"
+ [;; "*ashiftpsi.1_sign_extend.hi"
+ (set (match_dup 0)
+ (ashift:PSI (sign_extend:PSI (match_dup 1))
+ (const_int 1)))
+ ;; "addpsi3"
+ (set (match_dup 0)
+ (plus:PSI (match_dup 0)
+ (match_dup 2)))]
+ {
+ if (strncmp (XSTR (operands[2], 0), "CSWTCH.", strlen ("CSWTCH.")))
+ FAIL;
+ // Ok, this is from tree-switch-conversion. If, in the original
+ // pattern, bit 15 or 14 was set, then the code cooked up by
+ // tree-swich-conversion won't work anyways and we can just as well
+ // swap the shift with the extension.
+ })
+
+
+(define_insn "*extendpsi.ashift.1.uqi"
+ [(set (match_operand:PSI 0 "register_operand" "=r")
+ (any_extend:PSI (ashift:HI (zero_extend:HI (match_operand:QI 1 "register_operand" "0"))
+ (const_int 1))))]
+ ""
+ "lsl %A0\;clr %B0\;rol %B0\;clr %C0"
+ [(set_attr "length" "4")
+ (set_attr "cc" "clobber")])
+
+
+;; We abuse the addition to get grep for a split that transforms an
+;; expensive memx library call to an inline sequence of ELPM Z+.
+;; This is legitimate for lookups created by tree-switch-conversion.
+(define_insn_and_split "*split_xload<mode>-cswtch"
+ [(set (match_operand:MOVMODE 0 "register_operand" "=r")
+ (mem:MOVMODE (plus:PSI (match_operand:PSI 1 "register_operand" "r")
+ (match_operand:PSI 2 "symbol_ref_operand" "i"))))
+ (clobber (reg:MOVMODE 22))
+ (clobber (reg:QI 21))
+ (clobber (reg:HI REG_Z))]
+ "!reload_completed
+ && AVR_HAVE_ELPMX
+ && SYMBOL_REF_P (operands[2])
+ && 0 == strncmp (XSTR (operands[2], 0), \"CSWTCH.\", strlen (\"CSWTCH.\"))"
+ { gcc_unreachable(); }
+ "&& 1"
+ [;; "*load<mode>-flashx"
+ (parallel [(set (match_dup 0)
+ (unspec:MOVMODE [(match_dup 3)
+ ] UNSPEC_LOAD_FLASHX))
+ (clobber (reg:HI REG_Z))])]
+ {
+ rtx set = single_set (curr_insn);
+ rtx mem = SET_SRC (set);
+ // "addpsi3"
+ operands[3] = force_reg (PSImode, XEXP (mem, 0));
+ })
+
+
+(define_insn "*load<mode>-flashx"
+ [(set (match_operand:MOVMODE 0 "register_operand" "=r")
+ (unspec:MOVMODE [(match_operand:PSI 1 "register_operand" "r")
+ ] UNSPEC_LOAD_FLASHX))
+ (clobber (reg:HI REG_Z))]
+ "AVR_HAVE_ELPMX"
+ {
+ return avr_out_load_flashx (insn, operands, NULL);
+ }
+ [(set_attr "adjust_len" "load_flashx")])
+
+
(define_insn "addpsi3"
[(set (match_operand:PSI 0 "register_operand" "=??r,d ,d,r")
(plus:PSI (match_operand:PSI 1 "register_operand" "%0,0 ,0,0")
Index: config/avr/avr.opt
===================================================================
--- config/avr/avr.opt (revision 250302)
+++ config/avr/avr.opt (working copy)
@@ -18,6 +18,9 @@
; along with GCC; see the file COPYING3. If not see
; <http://www.gnu.org/licenses/>.
+HeaderInclude
+config/avr/avr-opts.h
+
mcall-prologues
Target Report Mask(CALL_PROLOGUES)
Use subroutines for function prologues and epilogues.
@@ -82,6 +85,38 @@ mpmem-wrap-around
Target Report
Make the linker relaxation machine assume that a program counter wrap-around occurs.
+maddr-space-for-lookup=
+Target RejectNegative Joined Var(avr_opt_addr_space_for_lookup) Enum(avr_aspace_for_lookup) Init(AVR_OPT_ADDR_SPACE_memx)
+Select the address space that might be used for accessing artificial lookup tables.
+
+Enum
+Name(avr_aspace_for_lookup) Type(enum avr_opt_addr_space)
+Known address space choices (for use with the -maddr-space-for-lookup= option):
+
+EnumValue
+Enum(avr_aspace_for_lookup) String(flash) Value(AVR_OPT_ADDR_SPACE_flash)
+
+EnumValue
+Enum(avr_aspace_for_lookup) String(flash1) Value(AVR_OPT_ADDR_SPACE_flash1)
+
+EnumValue
+Enum(avr_aspace_for_lookup) String(flash2) Value(AVR_OPT_ADDR_SPACE_flash2)
+
+EnumValue
+Enum(avr_aspace_for_lookup) String(flash3) Value(AVR_OPT_ADDR_SPACE_flash3)
+
+EnumValue
+Enum(avr_aspace_for_lookup) String(flash4) Value(AVR_OPT_ADDR_SPACE_flash4)
+
+EnumValue
+Enum(avr_aspace_for_lookup) String(flash5) Value(AVR_OPT_ADDR_SPACE_flash5)
+
+EnumValue
+Enum(avr_aspace_for_lookup) String(memx) Value(AVR_OPT_ADDR_SPACE_memx)
+
+EnumValue
+Enum(avr_aspace_for_lookup) String(generic) Value(AVR_OPT_ADDR_SPACE_generic)
+
maccumulate-args
Target Report Mask(ACCUMULATE_OUTGOING_ARGS)
Accumulate outgoing function arguments and acquire/release the needed stack space for outgoing function arguments in function prologue/epilogue. Without this option, outgoing arguments are pushed before calling a function and popped afterwards. This option can lead to reduced code size for functions that call many functions that get their arguments on the stack like, for example printf.
Index: doc/invoke.texi
===================================================================
--- doc/invoke.texi (revision 250302)
+++ doc/invoke.texi (working copy)
@@ -662,7 +662,7 @@ -imacros @var{file} -imultilib @var{dir
@gccoptlist{-mmcu=@var{mcu} -mabsdata -maccumulate-args @gol
-mbranch-cost=@var{cost} @gol
-mcall-prologues -mgas-isr-prologues -mint8 @gol
--mn_flash=@var{size} -mno-interrupts @gol
+-maddr-space-for-lookup=@var{as} -mn_flash=@var{size} -mno-interrupts @gol
-mrelax -mrmw -mstrict-X -mtiny-stack -mfract-convert-truncate @gol
-mshort-calls -nodevicelib @gol
-Waddr-space-convert -Wmisspelled-isr}
@@ -15977,6 +15977,40 @@ This option can lead to reduced code siz
several calls to functions that get their arguments on the stack like
calls to printf-like functions.
+@item -maddr-space-for-lookup=@var{addr-space}
+@opindex maddr-space-for-lookup
+When a simple initialization in a switch statement is converted to an
+initialization from a scalar array, the compiler puts the lookup array
+into @code{.rodata}, which is the generic address space.
+This address space is always used for devices where flash memory is seen
+in the the RAM address range like for @code{avrtiny} and @code{avrxmega3}.
+
+On devices where @code{.rodata} resides in RAM, this option allows to chose
+a different @ref{AVR Named Address Spaces, address space} @var{addr-space}
+which may be one of:
+@table @code
+@item memx
+The default.
+On devices with up to 64@tie{}KiB of program memory, use the 16-bit address
+space @code{__flash}. On devices with more than 64@tie{}KiB of program
+memory, use a 24-bit flash address space.
+@item flash
+@itemx flash1
+@itemx flash2
+@itemx flash3
+@itemx flash4
+@itemx flash5
+Use the 16-bit address space @code{__flash} @dots{} @code{__flash5},
+respectively. If the respective address space is not supported for the device
+because it is beyond its flash size, @code{__flash} is used as a fallback.
+@item generic
+Use the generic address space, i.e. @code{.rodata}.
+@end table
+Other noticeable options in this context are:
+@option{-f[no-]tree-switch-conversion}, @option{-f[no-]jump-tables}
+and @option{--param case-values-threshold=@var{value}},
+@pxref{Optimize Options}.
+
@item -mbranch-cost=@var{cost}
@opindex mbranch-cost
Set the branch costs for conditional branch instructions to