On 08/11/2016 10:59 PM, Andrew Pinski wrote:
On Thu, Aug 11, 2016 at 1:49 PM, Denys Vlasenko <dvlas...@redhat.com> wrote:
falign-functions=N is too simplistic.
Ingo Molnar ran some tests and it looks on latest x86 CPUs, 64-byte alignment
runs fastest (he tried many other possibilites).
However, developers are less than thrilled by the idea of a slam-dunk 64-byte
aligning everything. Too much waste:
On 05/20/2015 02:47 AM, Linus Torvalds wrote:
> At the same time, I have to admit that I abhor a 64-byte function
> alignment, when we have a fair number of functions that are (much)
> smaller than that.
>
> Is there some way to get gcc to take the size of the function into
> account? Because aligning a 16-byte or 32-byte function on a 64-byte
> alignment is just criminally nasty and wasteful.
This change makes it possible to align function to 64-byte boundaries *IF*
this does not introduce huge amount of padding.
Patch drops forced alignment to 8 if requested alignment is higher than 8:
before the patch, -falign-functions=9 was generating
.p2align 4,,8
.p2align 3
which means: "align to 16 if the skip is 8 bytes or less; else align to 8".
After this change, ".p2align 3" is not emitted.
It is dropped because I ultimately want to do something
like -falign-functions=64,8 - IOW, I want to align functions to 64 bytes,
but only if that generates padding of less than 8 bytes - otherwise I want
*no alignment at all*. The forced ".p2align 3" interferes with that intention.
Testing:
tested that with -falign-functions=N (tried 8, 15, 16, 17...) the alignment
directives are the same before and after the patch.
Tested that -falign-functions=N,N (two equal paramenters) works exactly
like -falign-functions=N.
Two things I noticed you missed:
1) ChangeLog entries (this is required as per the GNU coding style)
Is this form okay? -
...
Testing:
tested that with -falign-functions=N (tried 8, 15, 16, 17...) the alignment
directives are the same before and after the patch.
Tested that -falign-functions=N,N (two equal paramenters) works exactly
like -falign-functions=N.
2016-08-11 Denys Vlasenko <dvlas...@redhat.com>
* common.opt: Change definitions so that -falign_FOO= accepts a string.
* config/i386/freebsd.h: Remove "If N is large, do at least 8 byte
alignment" code.
* config/i386/gnu-user.h: Likewise.
* config/i386/iamcu.h: Likewise.
* config/i386/openbsdelf.h: Likewise.
* config/i386/x86-64.h: Likewise.
* flags.h (struct target_flag_state): Add x_align_functions_max_skip
member.
* toplev.c (parse_N_M): New function.
(init_alignments): Set align_FOO_log, align_FOO, align_FOO_max_skip
from specified --align_FOO=N{,M} option
* varasm.c (assemble_start_function): Use align_functions_max_skip
instead of align_functions - 1.
Index: gcc/common.opt
===================================================================
....
2) Documentation changes.
Index: gcc/common.opt
===================================================================
...
#define ASM_OUTPUT_MAX_SKIP_ALIGN(FILE,LOG,MAX_SKIP) \
do { \
if ((LOG) != 0) { \
- if ((MAX_SKIP) == 0) fprintf ((FILE), "\t.p2align %d\n", (LOG)); \
- else { \
+ if ((MAX_SKIP) == 0 || (MAX_SKIP) >= (1<<(LOG))) \
+ fprintf ((FILE), "\t.p2align %d\n", (LOG)); \
+ else \
fprintf ((FILE), "\t.p2align %d,,%d\n", (LOG), (MAX_SKIP)); \
- /* Make sure that we have at least 8 byte alignment if > 8 byte \
- alignment is preferred. */ \
- if ((LOG) > 3 \
- && (1 << (LOG)) > ((MAX_SKIP) + 1) \
- && (MAX_SKIP) >= 7) \
- fputs ("\t.p2align 3\n", (FILE)); \
- }
\
} \
} while (0)
#undef ASM_OUTPUT_MAX_SKIP_PAD
These looks like a bug fix, it most likely should be sent separately;
maybe even first.
ok
Also can you make sure the generic ASM_OUTPUT_MAX_SKIP_ALIGN is done correctly?
If by generic you mean "non-arch-specific", the situation is as follows:
if arch does not define ASM_OUTPUT_MAX_SKIP_ALIGN(file, align, max_skip), then
ASM_OUTPUT_ALIGN_WITH_NOP(file, align) or
ASM_OUTPUT_ALIGN (file, align) macros are used.
As you se from their arguments, those macros don't take "max_skip" argument.
IOW: even in existing code, they ignore max_skip, and always pad to the
specified alignment.
Whether this is considered "correct" or not, is up to you. The patch
changes nothing for those arches.
I checked all arches which do have ASM_OUTPUT_MAX_SKIP_ALIGN.
visium has the same "align at least to 8" logic like many x86 flavors,
do you think it should be removed?
rs6000, aarch64, arm use ".p2align %d,,%d" logic with no special cases,
look good to me.
rx uses ".balign %d,3,%d" and special-cases TARGET_AS100_SYNTAX,
this arch is probably fine too.