On Thu, 14 Apr 2016, Prathamesh Kulkarni wrote:

> On 14 April 2016 at 13:56, Richard Biener <rguent...@suse.de> wrote:
> > On Thu, 14 Apr 2016, Prathamesh Kulkarni wrote:
> >
> >> On 12 April 2016 at 22:41, Richard Biener <rguent...@suse.de> wrote:
> >> > On April 12, 2016 3:47:19 PM GMT+02:00, Prathamesh Kulkarni 
> >> > <prathamesh.kulka...@linaro.org> wrote:
> >> >>Hi,
> >> >>How to check if two symbols are from same source file during WPA ?
> >> >>Is symbol1->lto_file_data == symbol2->lto_file_data true if symbol1
> >> >>and symbol2 are from same
> >> >>source files ? Would that be a sufficient condition or do I need to
> >> >>check for something more ?
> >> >
> >> > Why would you want to know?  The proposed check would verify it comes 
> >> > from the same TU.
> >> Hi,
> >> I am trying to address issue with section anchors with LTO.
> >> In non-LTO mode (or 1to1 map), a TU is mapped to object-file 1:1
> >> Therefore we can take advantage of section anchors.
> >> IIUC with balanced partitioning, variable is added in the first partition 
> >> it is
> >> referenced in. So it may get separated into other partition from functions
> >> referencing it, thus causing it to be loaded as an external reference.
> >>
> >> My point is that the "first" partition to which the variable is added may 
> >> not
> >> be the best choice for it.
> >> For instance:
> >> file1.c defines variables 'a' and 'b' and functions f1(), f2()
> >> file2.c defines f3().
> >> f1, f2, f3 use 'a' and 'b'.
> >>
> >> It could happen that f3, a and b are mapped to one partition (assuming
> >> f3 is processed first), and f1, f2 are mapped to another partition.
> >> so f1 and f2 are forced to load a, b as external references.
> >> It would be better instead to put  a and b in same partition as f1 and f2.
> >>
> >> As first cut I tried to implement the following very simple approach:
> >> Assume V(f) gives a set of all variables referenced by function f.
> >> Let S = V(f1) intersect V(f2) ... intersect V(fn).
> >> Add f1, f2, .. fn and S to same partition if
> >> a) S is non-empty, that is, we add functions and "common" variables
> >> referenced by these functions to same partition.
> >> b) Function uses at least two global variables. If a function uses
> >> only a single global variable, then AFAIU it wouldn't matter if it's loaded
> >> as external reference.
> >> However the above approach results in quite distorted partitioning
> >> creating very large partitions and took 4x time to build chromium with LTO.
> >>
> >> So I tried to restrain adding number of symbols, by allowing them to be 
> >> added
> >> only if they are from same TU.
> >> This would (hopefully) ensure that we are not worse than non-LTO case
> >> since we add functions and common set of variables referenced by these
> >> functions to same partition provided all of them belong to same TU.
> >> As a worst-case, it will add all symbols from the TU to which function
> >> belongs, if all the functions in
> >> that TU reference >=2 global variables and reference at least one global
> >> variable in common from the same TU. We might want to put a higher upper
> >> limit than 1 for number of shared referenced global vars to stop adding
> >> too many symbols to same partition.
> >>
> >> For eg:
> >> file1.c  defines variables 'a', 'b' and functions f1_ab(), f2_ab()
> >> file2.c: defines f3_ab()
> >> where f1_ab, f2_ab, f3_ab both use a, b
> >> If f1_ab is added to partiton, it would try adding 'a', 'b' and f2_ab
> >> to the same partition but
> >> not f3_ab() since it belongs to another TU.
> >> (f3_ab() may get added to the same partition by another iteration if
> >> there's enough space in
> >> the partition but that's irrelevant).
> >>
> >> Ideally I would like to put those functions and variables in the same
> >> partition such that the use
> >> of section anchors can be maximized, which would also involve adding
> >> functions from other TU.
> >> In above case it's desirable to add f3_ab to the same
> >> partition as f1_ab, however I am not sure what constraints to check for 
> >> since
> >> we could potentially end up adding lots of symbols (as in approach 1).
> >> For now, I only considered the check if they are from same TU.
> >>
> >> I have attached a prototype patch that implements the above approach.
> >> It passes testing on arm*-*-* and aarch64*-*-* and takes roughly same time
> >> to build chromium with LTO as without patch.
> >> The patch should only affect targets with section anchor support (arm,
> >> aarch64, ppc).
> >> I have to yet perform benchmarking with the patch, I will get back
> >> with some results
> >> after I get that done.
> >> I would be grateful for feedback especially regarding correctness.
> >
> > Ok, I didn't try to fully understand what you did but why not simply
> > keep partitioning of functions as-is and simply put variables into
> > the partition which contains the most references to it?
> >
> > Section anchor optimization should be a 2nd choice for partitioning
> > and really partitioning functions together like we do should have priority.
> Well balanced partitioning would reorder functions without considering section
> anchors, so putting variables in partition containing max references may not
> be optimal.
> For eg:
> file1.c defines a, b, and three functions f1, f2, f3 using a, b.
> As a worst case, balanced partitioning algo may put f1, f2, f3
> in separate partitions which would defeat section anchor optimization.
> 
> With the patch, I try to retain {a, b, f1, f2, f3} in same
> partition (other symbols from same TU may go elsewhere).
> The rationale for this is that the effect of section anchors
> remains the same as for non-LTO (although now we would eat up space
> of callees which would be put in the same partition).
> 
> Complication arises when different functions reference only some
> of the global variables.
> for eg:
> file1.c: defines a, b, c and functions f1, f2, f3, f4, f5.
> f1, f2 use a, b and f3, f4, f5 use a, c.
> The patch would add all the symbols {a, b, c, f1, f2, f3, f4, f5} to the
> same partition. As a worst-case this could end up adding
> all symbols from the same TU, which would lean more towards 1to1
> rather than balanced.
> 
> So I suppose we would rather  want to add that subset of symbols that
> has "maximum commonality" ?
> for eg:
> file1.c: defines a, b, c, d and functions:
> f1 -> {a, b, c}
> f2 -> {a, b, c}
> f3 -> {a, b, c, d}
> f4 -> {a, c, d}
> f5 -> {a, b}
> f6 -> {a, c}
> f7 -> {a, b}
> f8 -> {a, c, d}
> 
> Here the pair {a, b, c} is referenced by 2 functions.
> {a, b, c, d} referenced by 1 functions.
> {a, c, d} referenced by 2 functions
> {a, b} referenced by 2 functions
> {a, c} referenced by 1 function.
> 
> So to maximize "commonality", I suppose we would want to put
> {a, b, c, f1, f2} or {a, c, d, f4, f8} in same partition instead of all
> symbols.
> 
> Finding "max commonality"
> 
> Let map<V, F> be a hashtable.
> where V = {set of variables referenced}
> and F = {set of functions referencing the variables in set V}
> 
> for (each function f)
> {
>   V = {variables referenced by f}
>   map[V].insert (f); // insert f in map.F corresponding to map.V
> }
> 
> In above example,
> map would be:
> {a, b, c} -> {f1 f2}
> {a, b, c, d} -> {f3}
> {a, b} -> {f5, f7}
> {a, c, d} -> {f4, f8}
> {a, c} -> {f6}
> 
> We could define max "commonality" then as those entries in map
> which have max number of variables for max number of functions.
> 
> In above eg, entries having maximum number of functions (2):
> {a, b, c} -> {f1, f2}
> {a, b} -> {f5, f7}
> {a, c, d} -> {f4, f8}
> 
> Among these we want to choose entries having max number of variables
> (3), which is:
> {a, b, c} -> {f1, f2}
> {a, c, d} -> {f4, f8}
> Among these we can choose either entry, so we either put {a, b, c, f1, f2} in
> same partition or {a, c, d, f4, f8} in same partition.
> Which I suppose would be better than putting {a, b, c, d, f1, f2, f3,
> f4, f5, f6, f7, f8}
> in same partition as is done currently in the patch.
> 
> I am not sure if that's the best approach.
> for instance if we have 3 functions referencing same 6 variables
> and 4 functions referencing same 2 variables,
> then the above method would choose to put {4 functions 2 variables }
> in same partition which
> might be inferior to putting { 3 functions 6 variables } in same partitions.
> I suppose we could modify the above method to choose those entries
> having number of functions within range [max - CONSTANT, max]
> instead of choosing entries having max number of functions but I am
> not sure what would be a good value for CONSTANT.
> 
> We could also drop the same TU constraint since finding "max" commonality
> doesn't check if symbols are in same TU.
> This interferes with adding functions in callgraph order to same
> partition so perhaps
> we could have more constraints to ensure we don't add many symbols in
> partition boundary.
> 
> Does this sound reasonable ?

What happens in practice?  GCC doesn't put functions in random
partitions.

> If it's not desired by default could we gate it on an option ?
> AFIAU, section anchors optimization is important for ARM and AArch64.

For code size or for performance?  I wonder why section anchors cannot
be "implemented" using some special relocations and thus as linker
optimization.

Richard.

Reply via email to