On Thu, 14 Apr 2016, Prathamesh Kulkarni wrote: > On 14 April 2016 at 13:56, Richard Biener <rguent...@suse.de> wrote: > > On Thu, 14 Apr 2016, Prathamesh Kulkarni wrote: > > > >> On 12 April 2016 at 22:41, Richard Biener <rguent...@suse.de> wrote: > >> > On April 12, 2016 3:47:19 PM GMT+02:00, Prathamesh Kulkarni > >> > <prathamesh.kulka...@linaro.org> wrote: > >> >>Hi, > >> >>How to check if two symbols are from same source file during WPA ? > >> >>Is symbol1->lto_file_data == symbol2->lto_file_data true if symbol1 > >> >>and symbol2 are from same > >> >>source files ? Would that be a sufficient condition or do I need to > >> >>check for something more ? > >> > > >> > Why would you want to know? The proposed check would verify it comes > >> > from the same TU. > >> Hi, > >> I am trying to address issue with section anchors with LTO. > >> In non-LTO mode (or 1to1 map), a TU is mapped to object-file 1:1 > >> Therefore we can take advantage of section anchors. > >> IIUC with balanced partitioning, variable is added in the first partition > >> it is > >> referenced in. So it may get separated into other partition from functions > >> referencing it, thus causing it to be loaded as an external reference. > >> > >> My point is that the "first" partition to which the variable is added may > >> not > >> be the best choice for it. > >> For instance: > >> file1.c defines variables 'a' and 'b' and functions f1(), f2() > >> file2.c defines f3(). > >> f1, f2, f3 use 'a' and 'b'. > >> > >> It could happen that f3, a and b are mapped to one partition (assuming > >> f3 is processed first), and f1, f2 are mapped to another partition. > >> so f1 and f2 are forced to load a, b as external references. > >> It would be better instead to put a and b in same partition as f1 and f2. > >> > >> As first cut I tried to implement the following very simple approach: > >> Assume V(f) gives a set of all variables referenced by function f. > >> Let S = V(f1) intersect V(f2) ... intersect V(fn). > >> Add f1, f2, .. fn and S to same partition if > >> a) S is non-empty, that is, we add functions and "common" variables > >> referenced by these functions to same partition. > >> b) Function uses at least two global variables. If a function uses > >> only a single global variable, then AFAIU it wouldn't matter if it's loaded > >> as external reference. > >> However the above approach results in quite distorted partitioning > >> creating very large partitions and took 4x time to build chromium with LTO. > >> > >> So I tried to restrain adding number of symbols, by allowing them to be > >> added > >> only if they are from same TU. > >> This would (hopefully) ensure that we are not worse than non-LTO case > >> since we add functions and common set of variables referenced by these > >> functions to same partition provided all of them belong to same TU. > >> As a worst-case, it will add all symbols from the TU to which function > >> belongs, if all the functions in > >> that TU reference >=2 global variables and reference at least one global > >> variable in common from the same TU. We might want to put a higher upper > >> limit than 1 for number of shared referenced global vars to stop adding > >> too many symbols to same partition. > >> > >> For eg: > >> file1.c defines variables 'a', 'b' and functions f1_ab(), f2_ab() > >> file2.c: defines f3_ab() > >> where f1_ab, f2_ab, f3_ab both use a, b > >> If f1_ab is added to partiton, it would try adding 'a', 'b' and f2_ab > >> to the same partition but > >> not f3_ab() since it belongs to another TU. > >> (f3_ab() may get added to the same partition by another iteration if > >> there's enough space in > >> the partition but that's irrelevant). > >> > >> Ideally I would like to put those functions and variables in the same > >> partition such that the use > >> of section anchors can be maximized, which would also involve adding > >> functions from other TU. > >> In above case it's desirable to add f3_ab to the same > >> partition as f1_ab, however I am not sure what constraints to check for > >> since > >> we could potentially end up adding lots of symbols (as in approach 1). > >> For now, I only considered the check if they are from same TU. > >> > >> I have attached a prototype patch that implements the above approach. > >> It passes testing on arm*-*-* and aarch64*-*-* and takes roughly same time > >> to build chromium with LTO as without patch. > >> The patch should only affect targets with section anchor support (arm, > >> aarch64, ppc). > >> I have to yet perform benchmarking with the patch, I will get back > >> with some results > >> after I get that done. > >> I would be grateful for feedback especially regarding correctness. > > > > Ok, I didn't try to fully understand what you did but why not simply > > keep partitioning of functions as-is and simply put variables into > > the partition which contains the most references to it? > > > > Section anchor optimization should be a 2nd choice for partitioning > > and really partitioning functions together like we do should have priority. > Well balanced partitioning would reorder functions without considering section > anchors, so putting variables in partition containing max references may not > be optimal. > For eg: > file1.c defines a, b, and three functions f1, f2, f3 using a, b. > As a worst case, balanced partitioning algo may put f1, f2, f3 > in separate partitions which would defeat section anchor optimization. > > With the patch, I try to retain {a, b, f1, f2, f3} in same > partition (other symbols from same TU may go elsewhere). > The rationale for this is that the effect of section anchors > remains the same as for non-LTO (although now we would eat up space > of callees which would be put in the same partition). > > Complication arises when different functions reference only some > of the global variables. > for eg: > file1.c: defines a, b, c and functions f1, f2, f3, f4, f5. > f1, f2 use a, b and f3, f4, f5 use a, c. > The patch would add all the symbols {a, b, c, f1, f2, f3, f4, f5} to the > same partition. As a worst-case this could end up adding > all symbols from the same TU, which would lean more towards 1to1 > rather than balanced. > > So I suppose we would rather want to add that subset of symbols that > has "maximum commonality" ? > for eg: > file1.c: defines a, b, c, d and functions: > f1 -> {a, b, c} > f2 -> {a, b, c} > f3 -> {a, b, c, d} > f4 -> {a, c, d} > f5 -> {a, b} > f6 -> {a, c} > f7 -> {a, b} > f8 -> {a, c, d} > > Here the pair {a, b, c} is referenced by 2 functions. > {a, b, c, d} referenced by 1 functions. > {a, c, d} referenced by 2 functions > {a, b} referenced by 2 functions > {a, c} referenced by 1 function. > > So to maximize "commonality", I suppose we would want to put > {a, b, c, f1, f2} or {a, c, d, f4, f8} in same partition instead of all > symbols. > > Finding "max commonality" > > Let map<V, F> be a hashtable. > where V = {set of variables referenced} > and F = {set of functions referencing the variables in set V} > > for (each function f) > { > V = {variables referenced by f} > map[V].insert (f); // insert f in map.F corresponding to map.V > } > > In above example, > map would be: > {a, b, c} -> {f1 f2} > {a, b, c, d} -> {f3} > {a, b} -> {f5, f7} > {a, c, d} -> {f4, f8} > {a, c} -> {f6} > > We could define max "commonality" then as those entries in map > which have max number of variables for max number of functions. > > In above eg, entries having maximum number of functions (2): > {a, b, c} -> {f1, f2} > {a, b} -> {f5, f7} > {a, c, d} -> {f4, f8} > > Among these we want to choose entries having max number of variables > (3), which is: > {a, b, c} -> {f1, f2} > {a, c, d} -> {f4, f8} > Among these we can choose either entry, so we either put {a, b, c, f1, f2} in > same partition or {a, c, d, f4, f8} in same partition. > Which I suppose would be better than putting {a, b, c, d, f1, f2, f3, > f4, f5, f6, f7, f8} > in same partition as is done currently in the patch. > > I am not sure if that's the best approach. > for instance if we have 3 functions referencing same 6 variables > and 4 functions referencing same 2 variables, > then the above method would choose to put {4 functions 2 variables } > in same partition which > might be inferior to putting { 3 functions 6 variables } in same partitions. > I suppose we could modify the above method to choose those entries > having number of functions within range [max - CONSTANT, max] > instead of choosing entries having max number of functions but I am > not sure what would be a good value for CONSTANT. > > We could also drop the same TU constraint since finding "max" commonality > doesn't check if symbols are in same TU. > This interferes with adding functions in callgraph order to same > partition so perhaps > we could have more constraints to ensure we don't add many symbols in > partition boundary. > > Does this sound reasonable ?
What happens in practice? GCC doesn't put functions in random partitions. > If it's not desired by default could we gate it on an option ? > AFIAU, section anchors optimization is important for ARM and AArch64. For code size or for performance? I wonder why section anchors cannot be "implemented" using some special relocations and thus as linker optimization. Richard.