On 11/05/18 17:49, Freddie Chopin wrote: > On Fri, 2018-05-11 at 13:06 +0200, David Brown wrote: >> For the Cortex-M devices (and probably many other RISC targets), >> -fdata-sections comes at a big cost - it effectively blocks >> -fsection-anchors and makes access to file-static data a lot bigger. >> People often use -fdata-sections and -ffunction-sections along with >> -Wl,--gc-sections with the aim of removing unused code and data (and >> thus saving space, useful on small devices) - I would expect LTO >> would >> manage that anyway. The other purpose of these is to improve >> locality >> of reference - again LTO should do that for you. But even without >> LTO, >> I find the cost of -fdata-sections high compared to -fsection- >> anchors. > > Unfortunatelly having LTO doesn't make -ffunction-sections + -fdata- > sections + --gc-sections useless. > > My test project compiled: > - without LTO and without these attributes - 150824 B ROM + 4240 B RAM > - with LTO and without these attributes - 133812 B ROM + 4208 B RAM > - without LTO and with these attributes - 124456 B ROM + 3484 B RAM > - with LTO and with these attributes - 120280 B ROM + 3680 B RAM > > As you see these attributes give much more than LTO in terms of size. >
Interesting. Making these sections and then using gc-sections should only remove code that is not used - LTO should do that anyway. Have you tried with -ffunction-sections and not -fdata-sections? It is the -fdata-sections that ruins -fsection-anchors - the -ffunction-sections doesn't have the same kind of cost. > > As for the -fsection-anchors I guess this has no use for non-PIC code > for arm-none-eabi. Whether I use it or not, the sizes are identical. > No, -fsection-anchors has plenty of use for fixed-position eabi code. Take this little example code: static int x; static int y; static int z; void foo(void) { int t = x; x = y; y = z; z = t; } Compiled with gcc (4.8, as that's what I had convenient) with -O2 -mcpu=cortex-m4 -mthumb and -fsection-anchors (enabled automatically with -O2, I believe), this gives: 21 foo: 22 @ args = 0, pretend = 0, frame = 0 23 @ frame_needed = 0, uses_anonymous_args = 0 24 @ link register save eliminated. 25 0000 034B ldr r3, .L2 26 0002 93E80500 ldmia r3, {r0, r2} 27 0006 9968 ldr r1, [r3, #8] 28 0008 1A60 str r2, [r3] 29 000a 9860 str r0, [r3, #8] 30 000c 5960 str r1, [r3, #4] 31 000e 7047 bx lr 32 .L3: 33 .align 2 34 .L2: 35 0010 00000000 .word .LANCHOR0 37 .bss 38 .align 2 39 .set .LANCHOR0,. + 0 42 x: 43 0000 00000000 .space 4 46 y: 47 0004 00000000 .space 4 50 z: 51 0008 00000000 .space 4 With -fdata-sections, I get: 21 foo: 22 @ args = 0, pretend = 0, frame = 0 23 @ frame_needed = 0, uses_anonymous_args = 0 24 @ link register save eliminated. 25 0000 30B4 push {r4, r5} 26 0002 0549 ldr r1, .L2 27 0004 054B ldr r3, .L2+4 28 0006 064A ldr r2, .L2+8 29 0008 0D68 ldr r5, [r1] 30 000a 1468 ldr r4, [r2] 31 000c 1868 ldr r0, [r3] 32 000e 1560 str r5, [r2] 33 0010 1C60 str r4, [r3] 34 0012 0860 str r0, [r1] 35 0014 30BC pop {r4, r5} 36 0016 7047 bx lr 37 .L3: 38 .align 2 39 .L2: 40 0018 00000000 .word .LANCHOR0 41 001c 00000000 .word .LANCHOR1 42 0020 00000000 .word .LANCHOR2 44 .section .bss.x,"aw",%nobits 45 .align 2 46 .set .LANCHOR0,. + 0 49 x: 50 0000 00000000 .space 4 51 .section .bss.y,"aw",%nobits 52 .align 2 53 .set .LANCHOR1,. + 0 56 y: 57 0000 00000000 .space 4 58 .section .bss.z,"aw",%nobits 59 .align 2 60 .set .LANCHOR2,. + 0 63 z: 64 0000 00000000 .space 4 The code is clearly bigger and slower, and uses more anchors in the code section. Note that to get similar improvements with non-static data, you need "-fno-common" - a flag that I believe should be the default for the compiler.