On 09/29/2016 04:45 PM, Bernd Schmidt wrote:
On 09/28/2016 02:57 PM, Denys Vlasenko wrote:
No change from past behavior: Tested that "-falign-functions" uses
an arch-dependent alignment. Tested that "-O2" uses an
arch-dependent alignment. Tested that "-O2 -falign-functions=N"
uses explicitly given alignment.
I suspect we may want some testcases to cover this as well as the new
functionality. Look for existing ones that can maybe be adapted.
I added one - testsuite/gcc.target/i386/falign-functions.c
Index: gcc/common/config/i386/i386-common.c
===================================================================
--- gcc/common/config/i386/i386-common.c (revision 239860)
+++ gcc/common/config/i386/i386-common.c (working copy)
@@ -998,36 +998,17 @@ ix86_handle_option (struct gcc_options *opts,
}
return true;
-
- /* Comes from final.c -- no real reason to change it. */
-#define MAX_CODE_ALIGN 16
-
case OPT_malign_loops_:
warning_at (loc, 0, "-malign-loops is obsolete, use -falign-loops");
- if (value > MAX_CODE_ALIGN)
- error_at (loc, "-malign-loops=%d is not between 0 and %d",
- value, MAX_CODE_ALIGN);
- else
- opts->x_align_loops = 1 << value;
return true;
That does seem to be a functional change. I'll defer to Uros.
It would be awkward to translate -malign-loops=%d et al
to comma-separated string format.
Since this warning is there for some 15 years already,
anyone who actually cares should have converted to new options
long ago. With patch, -malign-loops=%d will still emit a warning,
but be ignored. At worst, this would result in not aligning
code as requested via obsolete options.
Index: gcc/config/arm/arm.c
===================================================================
--- gcc/config/arm/arm.c (revision 239860)
+++ gcc/config/arm/arm.c (working copy)
@@ -2899,9 +2899,10 @@ static GTY(()) tree init_optimize;
static void
arm_override_options_after_change_1 (struct gcc_options *opts)
{
- if (opts->x_align_functions <= 0)
+ if (opts->x_flag_align_functions && !opts->x_str_align_functions)
Are these conditions really equivalent? It looks like zero was
the default even when no -falign-functions was specified.
Or is that overriden by init_alignments?
{
- if (opts->x_align_loops == 0)
+ /* -falign-foo without argument: supply one */
+ if (opts->x_flag_align_loops && !opts->x_str_align_loops)
Same here.
The execution flow for option parsing is somewhat convoluted, no doubt.
I found it experimentally that these are locations where
default alignment parameters are set when -falign-functions
is given with no arguments (or when it is implied by -O2).
+@gccoptlist{-faggressive-loop-optimizations @gol
+-falign-functions[=@var{n}[,@var{m},[@var{n}[,@var{m}]]]] @gol
+-falign-jumps[=@var{n}[,@var{m},[@var{n}[,@var{m}]]]] @gol
+-falign-labels[=@var{n}[,@var{m},[@var{n}[,@var{m}]]]] @gol
+-falign-loops[=@var{n}[,@var{m},[@var{n}[,@var{m}]]]] @gol
I think it would be best not to use the same name for different arguments.
Maybe call the second set n2, m2 (everywhere).
@item -falign-functions
@itemx -falign-functions=@var{n}
+@itemx -falign-functions=@var{n},@var{m}
+@itemx -falign-functions=@var{n},@var{m},@var{n},@var{m}
Three args is also valid, isn't it (here and for the other options)?
Yes.
@item -falign-labels
@itemx -falign-labels=@var{n}
+@itemx -falign-labels=@var{n},@var{m}
+@itemx -falign-labels=@var{n},@var{m},@var{n},@var{m}
@opindex falign-labels
+If @var{m} is not specified, it defaults to @var{n}.
Align all branch targets to a power-of-two boundary, skipping up to
-@var{n} bytes like @option{-falign-functions}. This option can easily
+@var{m}-1 bytes like @option{-falign-functions}.
Maybe just write "Align all branch targets to a power-of-two boundary. The options
are exactly as described for @option{-falign-functions}."
Why m-1, by the way, Wouldn't it be simpler to just specify the maximum number
of bytes skipped?
Two reasons.
(1) Because of the defaults. What M defaults to in -falign-labels=N? It's easy
to remember that missing M defaults to N.
The rule "M defaults to N-1" would be more awkward to remember.
(2) This parameter can be seen not as "maximum number of bytes skipped", but as
"minimum number of bytes available before boundary".
IOW: -falign-functions=N,M is
"align to N so as to ensure that at least M [without -1!] bytes can be fetched
before N-byte boundary".
Example: -falign-functions=16 == -falign-functions=16,16 ==
"align to 16 so as to ensure that at least [or in this case, always] 16 bytes
are available before 16-byte boundary".
This is actually a more useful point of view:
in choosing M, you need to decide what is the typical instruction length.
For example, http://lkml.iu.edu/hypermail/linux/kernel/1505.2/03292.html
> defconfig vmlinux (w/o FRAME_POINTER) has 42141 functions.
> 6923 of them have 1st insn 5 or more bytes long,
> 5841 of them have 1st insn 6 or more bytes long,
> 5095 of them have 1st insn 7 or more bytes long,
> 786 of them have 1st insn 8 or more bytes long,
> 548 of them have 1st insn 9 or more bytes long,
> 375 of them have 1st insn 10 or more bytes long,
> 73 of them have 1st insn 11 or more bytes long,
> one of them has 1st insn 12 bytes long
>
> Thus ensuring that at least seven first bytes do not cross
> 64-byte boundary would cover >98% of all functions.
With the current patch logic, if you want at least 7 bytes to be fetched without
crossing the cacheline, you simply set M=7 in -falign-functions=N,M. Not M=7-1.
Not M=7+1. Not M="I forgot that stupid rule again, + or - ???"
+#ifdef SUBALIGN_LOG
We want to avoid adding new #defines; existing ones are being converted
to target hooks. I suspect the best way is to record whether an M value
was specified, and override it in the x86 option_override if required.
If that's infeasible for some reason we can revisit this.
+ /* Before -falign-foo=N,M,N2,M2 was introduced, x86 had a tweak.
+ -falign-functions=N with N > 8 was adding secondary alignment.
+ -falign-functions=10 was emitting this before every function:
+ .p2align 4,,9
+ .p2align 3
+ Now this behavior (and more) can be explicitly requested:
+ -falign-functions=16,10,8
+ Retain old behavior if N2 is missing: */
+ else if (a[0].log > SUBALIGN_LOG)
+ a[1].log = SUBALIGN_LOG;
So this would live in i386.c, probably.
Thanks, I'm working on it...