Hi Christophe, On Fri, Jan 15, 2021, at 4:30 AM, Christophe Lyon wrote: > On Fri, 15 Jan 2021 at 12:39, Daniel Engel <lib...@danielengel.com> wrote: > > > > Hi Christophe, > > > > On Mon, Jan 11, 2021, at 8:39 AM, Christophe Lyon wrote: > > > On Mon, 11 Jan 2021 at 17:18, Daniel Engel <lib...@danielengel.com> wrote: > > > > > > > > On Mon, Jan 11, 2021, at 8:07 AM, Christophe Lyon wrote: > > > > > On Sat, 9 Jan 2021 at 14:09, Christophe Lyon > > > > > <christophe.l...@linaro.org> wrote: > > > > > > > > > > > > On Sat, 9 Jan 2021 at 13:27, Daniel Engel <lib...@danielengel.com> > > > > > > wrote: > > > > > > > > > > > > > > On Thu, Jan 7, 2021, at 4:56 AM, Richard Earnshaw wrote: > > > > > > > > On 07/01/2021 00:59, Daniel Engel wrote: > > > > > > > > > --snip-- > > > > > > > > > > > > > > > > > > On Wed, Jan 6, 2021, at 9:05 AM, Richard Earnshaw wrote: > > > > > > > > > --snip-- > > > > > > > > > > > > > > > > > >> - finally, your popcount implementations have data in the > > > > > > > > >> code segment. > > > > > > > > >> That's going to cause problems when we have compilation > > > > > > > > >> options such as > > > > > > > > >> -mpure-code. > > > > > > > > > > > > > > > > > > I am just following the precedent of existing lib1funcs (e.g. > > > > > > > > > __clz2si). > > > > > > > > > If this matters, you'll need to point in the right direction > > > > > > > > > for the > > > > > > > > > fix. I'm not sure it does matter, since these functions are > > > > > > > > > PIC anyway. > > > > > > > > > > > > > > > > That might be a bug in the clz implementations - Christophe: > > > > > > > > Any thoughts? > > > > > > > > > > > > > > __clzsi2() has test coverage in > > > > > > > "gcc.c-torture/execute/builtin-bitops-1.c" > > > > > > Thanks, I'll have a closer look at why I didn't see problems. > > > > > > > > > > > > > > > > So, that's because the code goes to the .text section (as opposed to > > > > > .text.noread) > > > > > and does not have the PURECODE flag. The compiler takes care of this > > > > > when generating code with -mpure-code. > > > > > And the simulator does not complain because it only checks loads from > > > > > the segment with the PURECODE flag set. > > > > > > > > > This is far out of my depth, but can something like: > > > > > > > > ifeq (,$(findstring __symbian__,$(shell $(gcc_compile_bare) -dM -E - > > > > </dev/null))) > > > > > > > > be adapted to: > > > > > > > > a) detect the state of the -mpure-code switch, and > > > > b) pass that flag to the preprocessor? > > > > > > > > If so, I can probably fix both the target section and the data usage. > > > > Just have to add a few instructions to finish unrolling the loop. > > > > > > I must confess I never checked libgcc's Makefile deeply before, > > > but it looks like you can probably detect whether -mpure-code is > > > part of $CFLAGS. > > > > > > However, it might be better to write pure-code-safe code > > > unconditionally because the toolchain will probably not > > > be rebuilt with -mpure-code as discussed before. > > > Or that could mean adding a -mpure-code multilib.... > > > > I have learned a few things since the last update. I think I know how > > to get -mpure-code out of CFLAGS and into a macro. However, I have hit > > something of a wall with testing. I can't seem to compile any flavor of > > libgcc with CFLAGS_FOR_TARGET="-mpure-code". > > > > 1. Configuring --with-multilib-list=rmprofile results in build failure: > > > > checking for suffix of object files... configure: error: in > > `/home/mirdan/gcc-obj/arm-none-eabi/libgcc': > > configure: error: cannot compute suffix of object files: cannot compile > > See `config.log' for more details > > > > cc1: error: -mpure-code only supports non-pic code on M-profile targets > > > > Yes, I did hit that wall too :-) > > Hence what we discussed earlier: the toolchain is not rebuilt with > -mpure-code. > > Note that there are problems in newlib too, but users of -mpure-code seem > to be able to work around that (eg. using their own startup code and no > stdlib)
Is there a current side project to solve the makefile problems? I think I'm back to my original question: If libgcc can't be built with -mpure-code, and users bypass it completely with -nostdlib, then why this conversation about pure-code compatibility of __clzsi2() etc? > > 2. Attempting to filter the multib list results in configuration error. > > This might have been misguided, but it was something I tried: > > > > Error: --with-multilib-list=armv6s-m not supported. > > > > Error: --with-multilib-list=mthumb/march=armv6s-m/mfloat-abi=soft not > > supported > > I think only 2 values are supported: aprofile and rmprofile. It looks like this might require a custom t-* multilib in gcc/config/arm. > > 3. Attempting to configure a single architecture results in a build error. > > > > --with-mode=thumb --with-arch=armv6s-m --with-float=soft > > > > checking for suffix of object files... configure: error: in > > `/home/mirdan/gcc-obj/arm-none-eabi/arm/autofp/v5te/fpu/libgcc': > > configure: error: cannot compute suffix of object files: cannot compile > > See `config.log' for more details > > > > conftest.c:9:10: fatal error: ac_nonexistent.h: No such file or > > directory > > 9 | #include <ac_nonexistent.h> > > | ^~~~~~~~~~~~~~~~~~ > I never saw that error message, but I never build using --with-arch. > I do use --with-cpu though. > > > This has me wondering whether pure-code in libgcc is a real issue ... > > If there's a way to build libgcc with -mpure-code, please enlighten me. > I haven't done so yet. Maybe building the toolchain --with-cpu=cortex-m0 > works? No luck with that. Same error message as before: 4. --with-mode=thumb --with-arch=armv6s-m --with-float=soft --with-cpu=cortex-m0 Switch "--with-arch" may not be used with switch "--with-cpu" 5. Then: --with-mode=thumb --with-float=soft --with-cpu=cortex-m0 checking for suffix of object files... configure: error: in `/home/mirdan/gcc-obj/arm-none-eabi/arm/autofp/v5te/fpu/libgcc': configure: error: cannot compute suffix of object files: cannot compile See `config.log' for more details cc1: error: -mpure-code only supports non-pic code on M-profile targets 6. Finally! --with-float=soft --with-cpu=cortex-m0 --disable-multilib Once you know this, and read the docs sideways, the previous errors are all probably "works as designed". But, I can still grumble. With libgcc compiled with -mpure-code, I can confirm that 'builtin-bitops-1.c' (the test for __clzsi2) passed with libgcc as-is. I then added the SHF_ARM_PURECODE flag to the libgcc assembly functions and re-ran the test. Still passed. I then added -mpure-code to RUNTESTFLAGS and re-ran the test. Still passed. readelf confirmed that the test program is compiling as expected [1]: [ 2] .text PROGBITS 0000800c 00800c 003314 00 AXy 0 0 4 Key to Flags: W (write), A (alloc), X (execute), M (merge), S (strings), I (info), L (link order), O (extra OS processing required), G (group), T (TLS), C (compressed), x (unknown), o (OS specific), E (exclude), y (purecode), p (processor specific) It was only when I started inserting pure-code test directives into 'builtin-bitops-1.c' that 'make check' began to report errors. /* { dg-do compile } */ ... /* { dg-options "-mpure-code -mfp16-format=ieee" } */ /* { dg-final { scan-assembler-not "\\.(float|l\\?double|\d?byte|short|int|long|quad|word)\\s+\[^.\]" } } */ However, for reasons [2] [3] [4] [5], this wasn't actually useful. It's sufficient to say that there are many reasons that non-pure-code compatible functions exist in libgcc. Although I'm not sure how useful this will be in light of the previous findings, I did take the opportunity with a working compile process to modify the relevant assembly functions for -mpure-code compatibility. I can manually disassemble the library and verify correct compilation. I can manually run a non-pure-code builtin-bitops-1 with a pure-code library to verify correct execution. But, I don't think your standard regression suite will be able to exercise the new paths. The patch is below; you can consider this as 34/33 in the series. Regards, Daniel [1] It's pretty clear that the section flags in libgcc have never really mattered. When the linker strings all of the used objects together, the original sections disappear into a single output object. The compiler controls those flags regardless of what libgcc does.) [2] The existing pure-code tests are compile-only and cover just the disassembled 'main.o'. There is no test of a complete executable and there is no execution/simulation. [3] While other parts of binutils may understand SHF_ARM_PURECODE, I don't think the simulator checks section flags or throws exceptions. [4] builtin-bitops-1 modified this way will always fail due to the array data definitions (longs, longlongs, etc). GCC can't translate those to instructions. While the ".data" section would presumably be readable, scan-assembler-not doesn't know the difference. [5] Even if the simulator were modified to throw exceptions, this will continue to fail because _mainCRTStartup uses a literal pool. > Thanks, > > Christophe > > > > > > > > The 'clzs' and 'ctz' functions should never have problems. > > > > > > > -mpure-code > > > > > > > appears to be valid only when the 'movt' instruction is > > > > > > > available, which > > > > > > > means that the 'clz' instruction will also be available, so no > > > > > > > array loads. > > > > > > No, -mpure-code is also supported with v6m. > > > > > > > > > > > > > Is the -mpure-code state detectable as a preprocessor flag? While > > > > > > No. > > > > > > > > > > > > > 'movw'/'movt' appears to be the canonical solution, I'm not sure > > > > > > > it > > > > > > > should be the default just because a processor supports Thumb-2. > > > > > > > > > > > > > > Do users wanting to use -mpure-code recompile the toolchain to > > > > > > > avoid > > > > > > > constant data in compiled C functions? I don't think this is the > > > > > > > default for the typical toolchain scripts. > > > > > > No, users of -mpure-code do not recompile the toolchain. > > > > > > > > > > > > --snip -- > > > > > > > > > > > > > > > > Thanks, > > Daniel Add -mpure-code support to the CM0 functions. gcc/libgcc/ChangeLog: 2021-01-16 Daniel Engel <g...@danielengel.com> Makefile.in (MPURE_CODE): New macro defines __PURE_CODE__. (gcc_compile): Appended MPURE_CODE. lib1funcs.S (FUNC_START_SECTION): Set flags for __PURE_CODE__. clz2.S (__clzsi2): Added -mpure-code compatible instructions. ctz2.S (__ctzsi2): Same. popcnt.S (__popcountsi2, __popcountdi2): Same. diff --git a/libgcc/Makefile.in b/libgcc/Makefile.in index 2de57519734..cd6b5f9c1b0 100644 --- a/libgcc/Makefile.in +++ b/libgcc/Makefile.in @@ -303,6 +303,9 @@ CRTSTUFF_CFLAGS = -O2 $(GCC_CFLAGS) $(INCLUDES) $(MULTILIB_CFLAGS) -g0 \ # Extra flags to use when compiling crt{begin,end}.o. CRTSTUFF_T_CFLAGS = +# Pass the -mpure-code flag into assembly for conditional compilation. +MPURE_CODE = $(if $(findstring -mpure-code,$(CFLAGS)), -D__PURE_CODE__) + MULTIDIR := $(shell $(CC) $(CFLAGS) -print-multi-directory) MULTIOSDIR := $(shell $(CC) $(CFLAGS) -print-multi-os-directory) @@ -312,7 +315,7 @@ inst_slibdir = $(slibdir)$(MULTIOSSUBDIR) gcc_compile_bare = $(CC) $(INTERNAL_CFLAGS) compile_deps = -MT $@ -MD -MP -MF $(basename $@).dep -gcc_compile = $(gcc_compile_bare) -o $@ $(compile_deps) +gcc_compile = $(gcc_compile_bare) -o $@ $(compile_deps) $(MPURE_CODE) gcc_s_compile = $(gcc_compile) -DSHARED objects = $(filter %$(objext),$^) diff --git a/libgcc/config/arm/clz2.S b/libgcc/config/arm/clz2.S index a2de45ff651..97a44f5d187 100644 --- a/libgcc/config/arm/clz2.S +++ b/libgcc/config/arm/clz2.S @@ -214,17 +214,40 @@ FUNC_ENTRY clzsi2 IT(sub,ne) r2, #4 LLSYM(__clz2): + #if defined(__PURE_CODE__) && __PURE_CODE__ + // Without access to table data, continue unrolling the loop. + lsrs r1, r0, #2 + + #ifdef __HAVE_FEATURE_IT + do_it ne,t + #else + beq LLSYM(__clz1) + #endif + + // Out of 4 bits, the first '1' is somewhere in the highest 2, + // so the lower 2 bits are no longer interesting. + IT(mov,ne) r0, r1 + IT(sub,ne) r2, #2 + + LLSYM(__clz1): + // Convert remainder {0,1,2,3} to {0,1,2,2}. + lsrs r1, r0, #1 + bics r0, r1 + + #else /* !__PURE_CODE__ */ // Load the remainder by index adr r1, LLSYM(__clz_remainder) ldrb r0, [r1, r0] + #endif /* !__PURE_CODE__ */ #endif /* !__OPTIMIZE_SIZE__ */ // Account for the remainder. subs r0, r2, r0 RET - #if !defined(__OPTIMIZE_SIZE__) || !__OPTIMIZE_SIZE__ + #if !(defined(__OPTIMIZE_SIZE__) && __OPTIMIZE_SIZE__) && \ + !(defined(__PURE_CODE__) && __PURE_CODE__) .align 2 LLSYM(__clz_remainder): .byte 0,1,2,2,3,3,3,3,4,4,4,4,4,4,4,4 diff --git a/libgcc/config/arm/ctz2.S b/libgcc/config/arm/ctz2.S index b9528a061a2..6a49d64f3a6 100644 --- a/libgcc/config/arm/ctz2.S +++ b/libgcc/config/arm/ctz2.S @@ -209,11 +209,44 @@ FUNC_ENTRY ctzsi2 IT(sub,ne) r2, #4 LLSYM(__ctz2): + #if defined(__PURE_CODE__) && __PURE_CODE__ + // Without access to table data, continue unrolling the loop. + lsls r1, r0, #2 + + #ifdef __HAVE_FEATURE_IT + do_it ne, t + #else + beq LLSYM(__ctz1) + #endif + + // Out of 4 bits, the first '1' is somewhere in the lowest 2, + // so the higher 2 bits are no longer interesting. + IT(mov,ne) r0, r1 + IT(sub,ne) r2, #2 + + LLSYM(__ctz1): + // Convert remainder {0,1,2,3} in $r0[31:30] to {0,2,1,2}. + lsrs r0, #31 + + #ifdef __HAVE_FEATURE_IT + do_it cs, t + #else + bcc LLSYM(__ctz_zero) + #endif + + // If bit[30] of the remainder is set, neither of these bits count + // towards the result. Bit[31] must be cleared. + // Otherwise, bit[31] becomes the final remainder. + IT(sub,cs) r2, #2 + IT(eor,cs) r0, r0 + + #else /* !__PURE_CODE__ */ // Look up the remainder by index. lsrs r0, #28 adr r3, LLSYM(__ctz_remainder) ldrb r0, [r3, r0] + #endif /* !__PURE_CODE__ */ #endif /* !__OPTIMIZE_SIZE__ */ LLSYM(__ctz_zero): @@ -221,8 +254,9 @@ FUNC_ENTRY ctzsi2 subs r0, r2, r0 RET - #if (!defined(__ARM_FEATURE_CLZ) || !__ARM_FEATURE_CLZ) && \ - (!defined(__OPTIMIZE_SIZE__) || !__OPTIMIZE_SIZE__) + #if !(defined(__ARM_FEATURE_CLZ) && __ARM_FEATURE_CLZ) && \ + !(defined(__OPTIMIZE_SIZE__) && __OPTIMIZE_SIZE__) && \ + !(defined(__PURE_CODE__) && __PURE_CODE__) .align 2 LLSYM(__ctz_remainder): .byte 0,4,3,4,2,4,3,4,1,4,3,4,2,4,3,4 diff --git a/libgcc/config/arm/lib1funcs.S b/libgcc/config/arm/lib1funcs.S index 5148957144b..59b2370e160 100644 --- a/libgcc/config/arm/lib1funcs.S +++ b/libgcc/config/arm/lib1funcs.S @@ -454,7 +454,12 @@ SYM (\name): Use the *_START_SECTION macros for declarations that the linker should place in a non-defailt section (e.g. ".rodata", ".text.subsection"). */ .macro FUNC_START_SECTION name section - .section \section,"x" +#ifdef __PURE_CODE__ + /* SHF_ARM_PURECODE | SHF_ALLOC | SHF_EXECINSTR */ + .section \section,"0x20000006",%progbits +#else + .section \section,"ax",%progbits +#endif .align 0 FUNC_ENTRY \name .endm diff --git a/libgcc/config/arm/popcnt.S b/libgcc/config/arm/popcnt.S index 51b1ed745ee..d6f65403b5d 100644 --- a/libgcc/config/arm/popcnt.S +++ b/libgcc/config/arm/popcnt.S @@ -23,6 +23,29 @@ <http://www.gnu.org/licenses/>. */ +#if defined(L_popcountdi2) || defined(L_popcountsi2) + +.macro ldmask reg, temp, value + #if defined(__PURE_CODE__) && (__PURE_CODE__) + #ifdef NOT_ISA_TARGET_32BIT + movs \reg, \value + lsls \temp, \reg, #8 + orrs \reg, \temp + lsls \temp, \reg, #16 + orrs \reg, \temp + #else + // Assumption: __PURE_CODE__ only support M-profile. + movw \reg ((\value) * 0x101) + movt \reg ((\value) * 0x101) + #endif + #else + ldr \reg, =((\value) * 0x1010101) + #endif +.endm + +#endif + + #ifdef L_popcountdi2 // int __popcountdi2(int) @@ -49,7 +72,7 @@ FUNC_START_SECTION popcountdi2 .text.sorted.libgcc.popcountdi2 #else /* !__OPTIMIZE_SIZE__ */ // Load the one-bit alternating mask. - ldr r3, =0x55555555 + ldmask r3, r2, 0x55 // Reduce the second word. lsrs r2, r1, #1 @@ -62,7 +85,7 @@ FUNC_START_SECTION popcountdi2 .text.sorted.libgcc.popcountdi2 subs r0, r2 // Load the two-bit alternating mask. - ldr r3, =0x33333333 + ldmask r3, r2, 0x33 // Reduce the second word. lsrs r2, r1, #2 @@ -140,7 +163,7 @@ FUNC_ENTRY popcountsi2 #else /* !__OPTIMIZE_SIZE__ */ // Load the one-bit alternating mask. - ldr r3, =0x55555555 + ldmask r3, r2, 0x55 // Reduce the word. lsrs r1, r0, #1 @@ -148,7 +171,7 @@ FUNC_ENTRY popcountsi2 subs r0, r1 // Load the two-bit alternating mask. - ldr r3, =0x33333333 + ldmask r3, r2, 0x33 // Reduce the word. lsrs r1, r0, #2 @@ -158,7 +181,7 @@ FUNC_ENTRY popcountsi2 adds r0, r1 // Load the four-bit alternating mask. - ldr r3, =0x0F0F0F0F + ldmask r3, r2, 0x0F // Reduce the word. lsrs r1, r0, #4