On Thu, 8 Oct 2020 at 11:58, Richard Earnshaw <richard.earns...@foss.arm.com> wrote: > > On 08/10/2020 10:07, Christophe Lyon via Gcc-patches wrote: > > On Tue, 6 Oct 2020 at 18:02, Richard Earnshaw > > <richard.earns...@foss.arm.com> wrote: > >> > >> On 29/09/2020 20:50, Christophe Lyon via Gcc-patches wrote: > >>> When mi_delta is > 255 and -mpure-code is used, we cannot load delta > >>> from code memory (like we do without -mpure-code). > >>> > >>> This patch builds the value of mi_delta into r3 with a series of > >>> movs/adds/lsls. > >>> > >>> We also do some cleanup by not emitting the function address and delta > >>> via .word directives at the end of the thunk since we don't use them > >>> with -mpure-code. > >>> > >>> No need for new testcases, this bug was already identified by > >>> eg. pr46287-3.C > >>> > >>> 2020-09-29 Christophe Lyon <christophe.l...@linaro.org> > >>> > >>> gcc/ > >>> * config/arm/arm.c (arm_thumb1_mi_thunk): Build mi_delta in r3 and > >>> do not emit function address and delta when -mpure-code is used. > >> > > Hi Richard, > > > > Thanks for your comments. > > > >> There are some optimizations you can make to this code. > >> > >> Firstly, for values between 256 and 510 (inclusive), it would be better > >> to just expand a mov of 255 followed by an add. > > I now see the splitted for the "Pe" constraint which I hadn't noticed > > before, so I can write something similar indeed. > > > > However, I'm note quite sure to understand the benefit in the split > > when -mpure-code is NOT used. > > Consider: > > int f3_1 (void) { return 510; } > > int f3_2 (void) { return 511; } > > Compile with -O2 -mcpu=cortex-m0: > > f3_1: > > movs r0, #255 > > lsls r0, r0, #1 > > bx lr > > f3_2: > > ldr r0, .L4 > > bx lr > > > > The splitter makes the code bigger, does it "compensate" for this by > > not having to load the constant? > > Actually the constant uses 4 more bytes, which should be taken into > > account when comparing code size, > > Yes, the size of the literal pool entry needs to be taken into account. > It might happen that the entry could be shared with another use of that > literal, but in general that's rare. > > > so f3_1 uses 6 bytes, and f3_2 uses 8, so as you say below three > > thumb1 instructions would be equivalent in size compared to loading > > from the literal pool. Should the 256-510 range be extended? > > It's a bit borderline at three instructions when literal pools are not > expensive to use, but in thumb1 literal pools tend to be quite small due > to the limited pc offsets we can use. I think on balance we probably > want to use the instruction sequence unless optimizing for size. > > > > > > >> This is also true for > >> the literal pools alternative as well, so should be handled before all > >> this. > > I am not sure what you mean: with -mpure-code, the above sample is compiled > > as: > > f3_1: > > movs r0, #255 > > lsls r0, r0, #1 > > bx lr > > f3_2: > > movs r0, #1 > > lsls r0, r0, #8 > > adds r0, r0, #255 > > bx lr > > > > so the "return 510" case is already handled as without -mpure-code. > > I was thinking specifically of the thunk sequence where you seem to be > emitting instructions directly rather than generating RTL. The examples > you show here are not thunks. > OK thanks for the clarification.
Here is an updated version, split into 3 patches to hopefully make review easier. They apply on top of my other mpure-code patches for PR96967 and PR96770: https://gcc.gnu.org/pipermail/gcc-patches/2020-September/554956.html https://gcc.gnu.org/pipermail/gcc-patches/2020-September/554957.html I kept it this way to make incremental changes easier to understand. Patch 1: With the hope to avoid confusion and make maintenance easier, I have updated thumb1_gen_const_int() so that it can generate either RTL or asm. This way, all the code used to build thumb-1 constants is in the same place, in case we need to improve/fix it later. We now generate shorter sequences in several cases matching your comments. Patch 2: Removes the equivalent loop from thumb1_movsi_insn pattern and calls thumb1_gen_const_int. Patch 3: Update of the original patch in this thread, now calls thumb1_gen_const_int. > > > > >> I also suspect (but haven't check) that the base adjustment will > >> most commonly be a multiple of the machine word size (ie 4). If that is > >> the case then you could generate n/4 and then shift it left by 2 for an > >> even greater range of literals. > > I can see there is provision for this in the !TARGET_THUMB1_ONLY case, > > I'll update my patch. > > > >> More generally, any sequence of up to > >> three thumb1 instructions will be no larger, and probably as fast as the > >> existing literal pool fall back. > >> > >> Secondly, if the value is, for example, 65536 (0x10000), your code will > >> emit a mov followed by two shift-by-8 instructions; the two shifts could > >> be merged into a single shift-by-16. > > > > Right, I'll try to make use of thumb_shiftable_const. > > > >> Finally, I'd really like to see some executable tests for this, if at > >> all possible. > > I mentioned pr46287-3.C, but that's not the only existing testcase > > that showed the problem. There are also: > > g++.dg/opt/thunk1.C > > g++.dg/ipa/pr46984.C > > g++.dg/torture/pr46287.C > > g++.dg/torture/pr45699.C > > > > Do you want that I copy one of these in the arm subdir and add > > -mpure-code in dg-options? > > On reflection, probably not - that just makes things more complicated > with all the dg-options mess (I'm worried about interactions with other > sets of options on the command line and the fall-out from that). If > someone cares about pure-code they should be doing full testsuite runs > with it enabled and that should be sufficient. Yes, that's what I am doing manually, it's a bit tricky, and I use a modified simulator. Thanks, Christophe > > R. > > > > > Thanks, > > > > Christophe > > > >> R. > >> > >>> > >>> k# (use "git pull" to merge the remote branch into yours) > >>> --- > >>> gcc/config/arm/arm.c | 91 > >>> +++++++++++++++++++++++++++++++++++++--------------- > >>> 1 file changed, 66 insertions(+), 25 deletions(-) > >>> > >>> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c > >>> index ceeb91f..62abeb5 100644 > >>> --- a/gcc/config/arm/arm.c > >>> +++ b/gcc/config/arm/arm.c > >>> @@ -28342,9 +28342,43 @@ arm_thumb1_mi_thunk (FILE *file, tree, > >>> HOST_WIDE_INT delta, > >>> { > >>> if (mi_delta > 255) > >>> { > >>> - fputs ("\tldr\tr3, ", file); > >>> - assemble_name (file, label); > >>> - fputs ("+4\n", file); > >>> + /* With -mpure-code, we cannot load delta from the constant > >>> + pool: we build it explicitly. */ > >>> + if (target_pure_code) > >>> + { > >>> + bool mov_done_p = false; > >>> + int i; > >>> + > >>> + /* Emit upper 3 bytes if needed. */ > >>> + for (i = 0; i < 3; i++) > >>> + { > >>> + int byte = (mi_delta >> (8 * (3 - i))) & 0xff; > >>> + > >>> + if (byte) > >>> + { > >>> + if (mov_done_p) > >>> + asm_fprintf (file, "\tadds\tr3, #%d\n", byte); > >>> + else > >>> + asm_fprintf (file, "\tmovs\tr3, #%d\n", byte); > >>> + mov_done_p = true; > >>> + } > >>> + > >>> + if (mov_done_p) > >>> + asm_fprintf (file, "\tlsls\tr3, #8\n"); > >>> + } > >>> + > >>> + /* Emit lower byte if needed. */ > >>> + if (!mov_done_p) > >>> + asm_fprintf (file, "\tmovs\tr3, #%d\n", mi_delta & 0xff); > >>> + else if (mi_delta & 0xff) > >>> + asm_fprintf (file, "\tadds\tr3, #%d\n", mi_delta & 0xff); > >>> + } > >>> + else > >>> + { > >>> + fputs ("\tldr\tr3, ", file); > >>> + assemble_name (file, label); > >>> + fputs ("+4\n", file); > >>> + } > >>> asm_fprintf (file, "\t%ss\t%r, %r, r3\n", > >>> mi_op, this_regno, this_regno); > >>> } > >>> @@ -28380,30 +28414,37 @@ arm_thumb1_mi_thunk (FILE *file, tree, > >>> HOST_WIDE_INT delta, > >>> fputs ("\tpop\t{r3}\n", file); > >>> > >>> fprintf (file, "\tbx\tr12\n"); > >>> - ASM_OUTPUT_ALIGN (file, 2); > >>> - assemble_name (file, label); > >>> - fputs (":\n", file); > >>> - if (flag_pic) > >>> + > >>> + /* With -mpure-code, we don't need to emit literals for the > >>> + function address and delta since we emitted code to build > >>> + them. */ > >>> + if (!target_pure_code) > >>> { > >>> - /* Output ".word .LTHUNKn-[3,7]-.LTHUNKPCn". */ > >>> - rtx tem = XEXP (DECL_RTL (function), 0); > >>> - /* For TARGET_THUMB1_ONLY the thunk is in Thumb mode, so the PC > >>> - pipeline offset is four rather than eight. Adjust the offset > >>> - accordingly. */ > >>> - tem = plus_constant (GET_MODE (tem), tem, > >>> - TARGET_THUMB1_ONLY ? -3 : -7); > >>> - tem = gen_rtx_MINUS (GET_MODE (tem), > >>> - tem, > >>> - gen_rtx_SYMBOL_REF (Pmode, > >>> - ggc_strdup (labelpc))); > >>> - assemble_integer (tem, 4, BITS_PER_WORD, 1); > >>> - } > >>> - else > >>> - /* Output ".word .LTHUNKn". */ > >>> - assemble_integer (XEXP (DECL_RTL (function), 0), 4, BITS_PER_WORD, > >>> 1); > >>> + ASM_OUTPUT_ALIGN (file, 2); > >>> + assemble_name (file, label); > >>> + fputs (":\n", file); > >>> + if (flag_pic) > >>> + { > >>> + /* Output ".word .LTHUNKn-[3,7]-.LTHUNKPCn". */ > >>> + rtx tem = XEXP (DECL_RTL (function), 0); > >>> + /* For TARGET_THUMB1_ONLY the thunk is in Thumb mode, so the > >>> PC > >>> + pipeline offset is four rather than eight. Adjust the > >>> offset > >>> + accordingly. */ > >>> + tem = plus_constant (GET_MODE (tem), tem, > >>> + TARGET_THUMB1_ONLY ? -3 : -7); > >>> + tem = gen_rtx_MINUS (GET_MODE (tem), > >>> + tem, > >>> + gen_rtx_SYMBOL_REF (Pmode, > >>> + ggc_strdup > >>> (labelpc))); > >>> + assemble_integer (tem, 4, BITS_PER_WORD, 1); > >>> + } > >>> + else > >>> + /* Output ".word .LTHUNKn". */ > >>> + assemble_integer (XEXP (DECL_RTL (function), 0), 4, > >>> BITS_PER_WORD, 1); > >>> > >>> - if (TARGET_THUMB1_ONLY && mi_delta > 255) > >>> - assemble_integer (GEN_INT(mi_delta), 4, BITS_PER_WORD, 1); > >>> + if (TARGET_THUMB1_ONLY && mi_delta > 255) > >>> + assemble_integer (GEN_INT(mi_delta), 4, BITS_PER_WORD, 1); > >>> + } > >>> } > >>> else > >>> { > >>> > >> >
From 99ca30b86d986d4247c736973767f791a6abac44 Mon Sep 17 00:00:00 2001 From: Christophe Lyon <christophe.l...@linaro.org> Date: Fri, 9 Oct 2020 12:42:39 +0000 Subject: [PATCH 1/3] arm: Improve thumb1_gen_const_int Enable thumb1_gen_const_int to generate RTL or asm depending on the context, so that we avoid duplicating code to handle constants in thumb-1 with -mpure-code. When the OP0 parameter is NULL, generate asm in FILE, targeting the REGNO register. When OP0 is not NULL, generate RTL targeting OP0. The generated sequence is improved to handle right-shiftable and small values with less instructions. We now generate: 128: subs r0, r0, #128 264: movs r3, #33 lsls r3, #3 510: movs r3, #255 lsls r3, #1 512: movs r3, #1 lsls r3, #9 764: movs r3, #191 lsls r3, #2 65536: movs r3, #1 lsls r3, #16 0x123456: movs r3, #18 ;0x12 lsls r3, #8 adds r3, #52 ;0x34 lsls r3, #8 adds r3, #86 ;0x56 0x1123456: movs r3, #137 ;0x89 lsls r3, #8 adds r3, #26 ;0x1a lsls r3, #8 adds r3, #43 ;0x2b lsls r3, #1 0x1000010: movs r3, #16 lsls r3, #16 adds r3, #1 lsls r3, #4 0x1000011: movs r3, #1 lsls r3, #24 adds r3, #17 2020-10-12 Christophe Lyon <christophe.l...@linaro.org> gcc/ * config/arm/arm-protos.h (thumb1_gen_const_int): Update prototype. * config/arm/arm.c (thumb1_gen_const_int): Add capability to emit either RTL or asm, improve generated code. * config/arm/thumb1.md: Update splitter to use the new thumb1_gen_const_int prototype. --- gcc/config/arm/arm-protos.h | 2 +- gcc/config/arm/arm.c | 190 ++++++++++++++++++++++++++++++++++++-------- gcc/config/arm/thumb1.md | 2 +- 3 files changed, 160 insertions(+), 34 deletions(-) diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h index 0cc0ae7..673f94b 100644 --- a/gcc/config/arm/arm-protos.h +++ b/gcc/config/arm/arm-protos.h @@ -74,7 +74,7 @@ extern bool arm_small_register_classes_for_mode_p (machine_mode); extern int const_ok_for_arm (HOST_WIDE_INT); extern int const_ok_for_op (HOST_WIDE_INT, enum rtx_code); extern int const_ok_for_dimode_op (HOST_WIDE_INT, enum rtx_code); -extern void thumb1_gen_const_int (rtx, HOST_WIDE_INT); +extern void thumb1_gen_const_int (FILE *, int, rtx, HOST_WIDE_INT); extern int arm_split_constant (RTX_CODE, machine_mode, rtx, HOST_WIDE_INT, rtx, rtx, int); extern int legitimate_pic_operand_p (rtx); diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index ceeb91f..eb4675a 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -4513,38 +4513,6 @@ const_ok_for_dimode_op (HOST_WIDE_INT i, enum rtx_code code) } } -/* Emit a sequence of movs/adds/shift to produce a 32-bit constant. - Avoid generating useless code when one of the bytes is zero. */ -void -thumb1_gen_const_int (rtx op0, HOST_WIDE_INT op1) -{ - bool mov_done_p = false; - int i; - - /* Emit upper 3 bytes if needed. */ - for (i = 0; i < 3; i++) - { - int byte = (op1 >> (8 * (3 - i))) & 0xff; - - if (byte) - { - emit_set_insn (op0, mov_done_p - ? gen_rtx_PLUS (SImode,op0, GEN_INT (byte)) - : GEN_INT (byte)); - mov_done_p = true; - } - - if (mov_done_p) - emit_set_insn (op0, gen_rtx_ASHIFT (SImode, op0, GEN_INT (8))); - } - - /* Emit lower byte if needed. */ - if (!mov_done_p) - emit_set_insn (op0, GEN_INT (op1 & 0xff)); - else if (op1 & 0xff) - emit_set_insn (op0, gen_rtx_PLUS (SImode, op0, GEN_INT (op1 & 0xff))); -} - /* Emit a sequence of insns to handle a large constant. CODE is the code of the operation required, it can be any of SET, PLUS, IOR, AND, XOR, MINUS; @@ -28244,6 +28212,164 @@ arm_internal_label (FILE *stream, const char *prefix, unsigned long labelno) default_internal_label (stream, prefix, labelno); } +/* Emit a sequence of movs/adds/shift to produce a 32-bit constant. + Avoid generating useless code when one of the bytes is zero. If + OP0 is NULL, use REGNO as output register, use REGNO(OP0) + otherwise. */ +void +thumb1_gen_const_int (FILE *file, int regno, rtx op0, HOST_WIDE_INT op1) +{ + bool mov_done_p = false; + int val = op1; + int shift = 0; + int i; + + if (val == 0) + { + if (op0) + emit_set_insn (op0, GEN_INT (val)); + else + asm_fprintf (file, "\tmovs\tr%d, #%d\n", regno, val); + return; + } + + /* In the general case, we need 7 instructions to build + a 32 bits constant (1 movs, 3 lsls, 3 adds). We can + do better if VAL is small enough, or + right-shiftable by a suitable amount. If the + right-shift enables to encode at least one less byte, + it's worth it: we save a movs and a lsls at the + expense of a final lsls. */ + int final_shift = number_of_first_bit_set (val); + + int leading_zeroes = clz_hwi (val); + int number_of_bytes_needed = + ((HOST_BITS_PER_WIDE_INT - 1 - leading_zeroes) + / BITS_PER_UNIT) + 1; + int number_of_bytes_needed2 = + ((HOST_BITS_PER_WIDE_INT - 1 - leading_zeroes - final_shift) + / BITS_PER_UNIT) + 1; + + if (number_of_bytes_needed2 < number_of_bytes_needed) + val >>= final_shift; + else + final_shift = 0; + + /* If we are in a very small range, we can use either a single movs + or movs+adds. */ + if ((val >= 0) && (val <= 510)) + { + /* It's possible that VAL is now <= 255, if we + left-shifted it. */ + if (val > 255) + { + int high = val - 255; + + if (op0) + { + emit_set_insn (op0, GEN_INT (high)); + emit_set_insn (op0, gen_rtx_PLUS (SImode, op0, GEN_INT (255))); + } + else + { + asm_fprintf (file, "\tmovs\tr%d, #%d\n", regno, high); + asm_fprintf (file, "\tadds\tr%d, #%d\n", regno, 255); + } + } + else + { + if (op0) + emit_set_insn (op0, GEN_INT (val)); + else + asm_fprintf (file, "\tmovs\tr%d, #%d\n", regno, val); + } + + if (final_shift > 0) + { + if (op0) + emit_set_insn (op0, gen_rtx_ASHIFT (SImode, op0, GEN_INT (final_shift))); + else + asm_fprintf (file, "\tlsls\tr%d, #%d\n", regno, final_shift); + } + } + else + { + /* General case, emit upper 3 bytes as needed. */ + for (i = 0; i < 3; i++) + { + int byte = (val >> (8 * (3 - i))) & 0xff; + + if (byte) + { + /* We are about to emit new bits, stop accumulating a + shift amount, and left-shift only if we have already + emitted some upper bits. */ + if (mov_done_p) + { + if (op0) + emit_set_insn (op0, gen_rtx_ASHIFT (SImode, op0, GEN_INT (shift))); + else + asm_fprintf (file, "\tlsls\tr%d, #%d\n", regno, shift); + } + + if (mov_done_p) + { + if (op0) + emit_set_insn (op0, gen_rtx_PLUS (SImode, op0, GEN_INT (byte))); + else + asm_fprintf (file, "\tadds\tr%d, #%d\n", regno, byte); + } + else + { + if (op0) + emit_set_insn (op0, GEN_INT (byte)); + else + asm_fprintf (file, "\tmovs\tr%d, #%d\n", regno, byte); + } + + /* Stop accumulating shift amount since we've just + emitted some bits. */ + shift = 0; + + mov_done_p = true; + } + + if (mov_done_p) + shift += 8; + } + + /* Emit lower byte; it's not zero because we handled that case above. */ + if (!mov_done_p) + { + if (op0) + emit_set_insn (op0, GEN_INT (val & 0xff)); + else + asm_fprintf (file, "\tmovs\tr%d, #%d\n", regno, val & 0xff); + } + else + { + if (op0) + { + emit_set_insn (op0, gen_rtx_ASHIFT (SImode, op0, GEN_INT (shift))); + emit_set_insn (op0, gen_rtx_PLUS (SImode, op0, GEN_INT (val & 0xff))); + } + else + { + asm_fprintf (file, "\tlsls\tr%d, #%d\n", regno, shift); + asm_fprintf (file, "\tadds\tr%d, #%d\n", regno, val & 0xff); + } + } + + if (final_shift > 0) + { + if (op0) + emit_set_insn (op0, gen_rtx_ASHIFT (SImode, op0, GEN_INT (final_shift))); + else + asm_fprintf (file, "\tlsls\tr%d, #%d\n", regno, final_shift); + } + } +} + /* Output code to add DELTA to the first argument, and then jump to FUNCTION. Used for C++ multiple inheritance. */ diff --git a/gcc/config/arm/thumb1.md b/gcc/config/arm/thumb1.md index 2258a52..406edc5 100644 --- a/gcc/config/arm/thumb1.md +++ b/gcc/config/arm/thumb1.md @@ -820,7 +820,7 @@ (define_split && !satisfies_constraint_K (operands[1])" [(clobber (const_int 0))] " - thumb1_gen_const_int (operands[0], INTVAL (operands[1])); + thumb1_gen_const_int (NULL, 0, operands[0], INTVAL (operands[1])); DONE; " ) -- 2.7.4
From 6b7d3f02f275ee140227fa293636687a321a21f6 Mon Sep 17 00:00:00 2001 From: Christophe Lyon <christophe.l...@linaro.org> Date: Tue, 29 Sep 2020 19:40:39 +0000 Subject: [PATCH 3/3] arm: Fix multiple inheritance thunks for thumb-1 with -mpure-code When -mpure-code is used, we cannot load delta from code memory (like we do without -mpure-code). This patch builds the value of mi_delta into r3 with a series of movs/adds/lsls. We also do some cleanup by not emitting the function address and delta via .word directives at the end of the thunk since we don't use them with -mpure-code. No need for new testcases, this bug was already identified by: g++.dg/ipa/pr46287-3.C g++.dg/ipa/pr46984.C g++.dg/opt/thunk1.C g++.dg/torture/pr46287.C g++.dg/torture/pr45699.C 2020-10-12 Christophe Lyon <christophe.l...@linaro.org> gcc/ * config/arm/arm.c (arm_thumb1_mi_thunk): Build mi_delta in r3 and do not emit function address and delta when -mpure-code is used. --- gcc/config/arm/arm.c | 64 ++++++++++++++++++++++++++++++++-------------------- 1 file changed, 39 insertions(+), 25 deletions(-) diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index eb4675a..ed0828c 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -28468,9 +28468,16 @@ arm_thumb1_mi_thunk (FILE *file, tree, HOST_WIDE_INT delta, { if (mi_delta > 255) { - fputs ("\tldr\tr3, ", file); - assemble_name (file, label); - fputs ("+4\n", file); + /* With -mpure-code, we cannot load MI_DELTA from the + constant pool: we build it explicitly. */ + if (target_pure_code) + thumb1_gen_const_int (file, 3, NULL, mi_delta); + else + { + fputs ("\tldr\tr3, ", file); + assemble_name (file, label); + fputs ("+4\n", file); + } asm_fprintf (file, "\t%ss\t%r, %r, r3\n", mi_op, this_regno, this_regno); } @@ -28506,30 +28513,37 @@ arm_thumb1_mi_thunk (FILE *file, tree, HOST_WIDE_INT delta, fputs ("\tpop\t{r3}\n", file); fprintf (file, "\tbx\tr12\n"); - ASM_OUTPUT_ALIGN (file, 2); - assemble_name (file, label); - fputs (":\n", file); - if (flag_pic) + + /* With -mpure-code, we don't need to emit literals for the + function address and delta since we emitted code to build + them. */ + if (!target_pure_code) { - /* Output ".word .LTHUNKn-[3,7]-.LTHUNKPCn". */ - rtx tem = XEXP (DECL_RTL (function), 0); - /* For TARGET_THUMB1_ONLY the thunk is in Thumb mode, so the PC - pipeline offset is four rather than eight. Adjust the offset - accordingly. */ - tem = plus_constant (GET_MODE (tem), tem, - TARGET_THUMB1_ONLY ? -3 : -7); - tem = gen_rtx_MINUS (GET_MODE (tem), - tem, - gen_rtx_SYMBOL_REF (Pmode, - ggc_strdup (labelpc))); - assemble_integer (tem, 4, BITS_PER_WORD, 1); - } - else - /* Output ".word .LTHUNKn". */ - assemble_integer (XEXP (DECL_RTL (function), 0), 4, BITS_PER_WORD, 1); + ASM_OUTPUT_ALIGN (file, 2); + assemble_name (file, label); + fputs (":\n", file); + if (flag_pic) + { + /* Output ".word .LTHUNKn-[3,7]-.LTHUNKPCn". */ + rtx tem = XEXP (DECL_RTL (function), 0); + /* For TARGET_THUMB1_ONLY the thunk is in Thumb mode, so the PC + pipeline offset is four rather than eight. Adjust the offset + accordingly. */ + tem = plus_constant (GET_MODE (tem), tem, + TARGET_THUMB1_ONLY ? -3 : -7); + tem = gen_rtx_MINUS (GET_MODE (tem), + tem, + gen_rtx_SYMBOL_REF (Pmode, + ggc_strdup (labelpc))); + assemble_integer (tem, 4, BITS_PER_WORD, 1); + } + else + /* Output ".word .LTHUNKn". */ + assemble_integer (XEXP (DECL_RTL (function), 0), 4, BITS_PER_WORD, 1); - if (TARGET_THUMB1_ONLY && mi_delta > 255) - assemble_integer (GEN_INT(mi_delta), 4, BITS_PER_WORD, 1); + if (TARGET_THUMB1_ONLY && mi_delta > 255) + assemble_integer (GEN_INT(mi_delta), 4, BITS_PER_WORD, 1); + } } else { -- 2.7.4
From 31b09a3cf77b7731d17c5ff85d569481fc831bc7 Mon Sep 17 00:00:00 2001 From: Christophe Lyon <christophe.l...@linaro.org> Date: Fri, 9 Oct 2020 15:18:07 +0000 Subject: [PATCH 2/3] arm: Call thumb1_gen_const_int from thumb1_movsi_insn thumb1_movsi_insn used the same algorithm to build a constant in asm than thumb1_gen_const_int does in RTL. Since the previous patch added support for asm generation in thumb1_gen_const_int, this patch calls it from thumb1_movsi_insn to avoid duplication. 2020-10-12 Christophe Lyon <christophe.l...@linaro.org> gcc/ * config/arm/thumb1.md (thumb1_movsi_insn): Call thumb1_movsi_insn. --- gcc/config/arm/thumb1.md | 37 ++++--------------------------------- 1 file changed, 4 insertions(+), 33 deletions(-) diff --git a/gcc/config/arm/thumb1.md b/gcc/config/arm/thumb1.md index 406edc5..83499dd 100644 --- a/gcc/config/arm/thumb1.md +++ b/gcc/config/arm/thumb1.md @@ -688,40 +688,11 @@ (define_insn "*thumb1_movsi_insn" } else if (GET_CODE (operands[1]) == CONST_INT) { - int i; - HOST_WIDE_INT op1 = INTVAL (operands[1]); - bool mov_done_p = false; - rtx ops[2]; - ops[0] = operands[0]; - - /* Emit upper 3 bytes if needed. */ - for (i = 0; i < 3; i++) - { - int byte = (op1 >> (8 * (3 - i))) & 0xff; - - if (byte) - { - ops[1] = GEN_INT (byte); - if (mov_done_p) - output_asm_insn ("adds\t%0, %1", ops); - else - output_asm_insn ("movs\t%0, %1", ops); - mov_done_p = true; - } - - if (mov_done_p) - output_asm_insn ("lsls\t%0, #8", ops); - } - - /* Emit lower byte if needed. */ - ops[1] = GEN_INT (op1 & 0xff); - if (!mov_done_p) - output_asm_insn ("movs\t%0, %1", ops); - else if (op1 & 0xff) - output_asm_insn ("adds\t%0, %1", ops); - return ""; + thumb1_gen_const_int (NULL, 0, operands[0], INTVAL (operands[1])); + return \"\"; } - gcc_unreachable (); + + gcc_unreachable (); case 8: return "ldr\t%0, %1"; case 9: return "str\t%1, %0"; -- 2.7.4