On 12 April 2016 at 22:41, Richard Biener <rguent...@suse.de> wrote: > On April 12, 2016 3:47:19 PM GMT+02:00, Prathamesh Kulkarni > <prathamesh.kulka...@linaro.org> wrote: >>Hi, >>How to check if two symbols are from same source file during WPA ? >>Is symbol1->lto_file_data == symbol2->lto_file_data true if symbol1 >>and symbol2 are from same >>source files ? Would that be a sufficient condition or do I need to >>check for something more ? > > Why would you want to know? The proposed check would verify it comes from > the same TU. Hi, I am trying to address issue with section anchors with LTO. In non-LTO mode (or 1to1 map), a TU is mapped to object-file 1:1 Therefore we can take advantage of section anchors. IIUC with balanced partitioning, variable is added in the first partition it is referenced in. So it may get separated into other partition from functions referencing it, thus causing it to be loaded as an external reference.
My point is that the "first" partition to which the variable is added may not be the best choice for it. For instance: file1.c defines variables 'a' and 'b' and functions f1(), f2() file2.c defines f3(). f1, f2, f3 use 'a' and 'b'. It could happen that f3, a and b are mapped to one partition (assuming f3 is processed first), and f1, f2 are mapped to another partition. so f1 and f2 are forced to load a, b as external references. It would be better instead to put a and b in same partition as f1 and f2. As first cut I tried to implement the following very simple approach: Assume V(f) gives a set of all variables referenced by function f. Let S = V(f1) intersect V(f2) ... intersect V(fn). Add f1, f2, .. fn and S to same partition if a) S is non-empty, that is, we add functions and "common" variables referenced by these functions to same partition. b) Function uses at least two global variables. If a function uses only a single global variable, then AFAIU it wouldn't matter if it's loaded as external reference. However the above approach results in quite distorted partitioning creating very large partitions and took 4x time to build chromium with LTO. So I tried to restrain adding number of symbols, by allowing them to be added only if they are from same TU. This would (hopefully) ensure that we are not worse than non-LTO case since we add functions and common set of variables referenced by these functions to same partition provided all of them belong to same TU. As a worst-case, it will add all symbols from the TU to which function belongs, if all the functions in that TU reference >=2 global variables and reference at least one global variable in common from the same TU. We might want to put a higher upper limit than 1 for number of shared referenced global vars to stop adding too many symbols to same partition. For eg: file1.c defines variables 'a', 'b' and functions f1_ab(), f2_ab() file2.c: defines f3_ab() where f1_ab, f2_ab, f3_ab both use a, b If f1_ab is added to partiton, it would try adding 'a', 'b' and f2_ab to the same partition but not f3_ab() since it belongs to another TU. (f3_ab() may get added to the same partition by another iteration if there's enough space in the partition but that's irrelevant). Ideally I would like to put those functions and variables in the same partition such that the use of section anchors can be maximized, which would also involve adding functions from other TU. In above case it's desirable to add f3_ab to the same partition as f1_ab, however I am not sure what constraints to check for since we could potentially end up adding lots of symbols (as in approach 1). For now, I only considered the check if they are from same TU. I have attached a prototype patch that implements the above approach. It passes testing on arm*-*-* and aarch64*-*-* and takes roughly same time to build chromium with LTO as without patch. The patch should only affect targets with section anchor support (arm, aarch64, ppc). I have to yet perform benchmarking with the patch, I will get back with some results after I get that done. I would be grateful for feedback especially regarding correctness. Thanks, Prathamesh > > Richard. > >>Thanks, >>Prathamesh > >
diff --git a/gcc/lto/lto-partition.c b/gcc/lto/lto-partition.c index 9eb63c2..c7f17f1 100644 --- a/gcc/lto/lto-partition.c +++ b/gcc/lto/lto-partition.c @@ -34,6 +34,9 @@ along with GCC; see the file COPYING3. If not see #include "ipa-prop.h" #include "ipa-inline.h" #include "lto-partition.h" +#include <set> +#include <map> +#include "toplev.h" /* FIXME: for target_supports_section_anchor_p(). */ vec<ltrans_partition> ltrans_partitions; @@ -407,6 +410,63 @@ add_sorted_nodes (vec<symtab_node *> &next_nodes, ltrans_partition partition) add_symbol_to_partition (partition, node); } +/* Let V(f) be the set of variables referenced by function f and defined in + same TU as f. + Let S = V(f1) interset V(f2) .... intersect V(fn). + Add f1, f2, .. fn and S to set if: + i) S is non-empty + ii) Function uses at-least 2 global variables. + iii) f1, f2, ... fn are from same TU. + + for eg: + file1.c defines 'a', 'b', f1_ab(), f2_ab(). + file2.c defines f3_ab + f1_ab, f2_ab, f3_ab use 'a' and 'b' + get_referenced_cnodes_vnodes (f1_ab) would add {f1_ab, f2_ab, a, b} + to set, but not f3_ab because it is from different TU. + + FIXME: As a worst-case, get_referenced_cnode_vnodes() will add all symbols + from the TU to which cnode belongs, if all the functions in that TU + reference >=2 global variables and reference at least one global + variable in common from the same TU. We might want to put a higher upper + limit than 1 for number of shared referenced global vars to stop adding + too many symbols to same partition. */ + +static void +get_referenced_cnodes_vnodes (cgraph_node *cnode, std::set<symtab_node *>& set) +{ + std::set<varpool_node *> vnodes; + ipa_ref *ref; + varpool_node *vnode; + + /* If cnode is aleady present, don't do anything. */ + if (set.find (cnode) != set.end ()) + return; + + for (int j = 0; cnode->iterate_reference (j, ref); j++) + if ((vnode = dyn_cast<varpool_node *> (ref->referred)) + && !symbol_partitioned_p (vnode) + && vnode->lto_file_data == cnode->lto_file_data) + vnodes.insert (vnode); + + /* Ignore if function uses less than 2 global vars. */ + if (vnodes.size () < 2) + return; + + set.insert (cnode); + for (std::set<varpool_node *>::iterator it = vnodes.begin (); it != vnodes.end (); ++it) + { + vnode = *it; + set.insert (vnode); + cgraph_node *referring_node; + + for (int j = 0; vnode->iterate_referring (j, ref); j++) + if ((referring_node = dyn_cast<cgraph_node *> (ref->referring)) + && !symbol_partitioned_p (referring_node) + && referring_node->lto_file_data == cnode->lto_file_data) + get_referenced_cnodes_vnodes (referring_node, set); + } +} /* Group cgraph nodes into equally-sized partitions. @@ -467,6 +527,7 @@ lto_balanced_map (int n_lto_partitions) int npartitions; int current_order = -1; int noreorder_pos = 0; + std::map<varpool_node *, ltrans_partition> vnode_partition_map; FOR_EACH_VARIABLE (vnode) gcc_assert (!vnode->aux); @@ -613,6 +674,36 @@ lto_balanced_map (int n_lto_partitions) else cost += edge_cost; } + + /* If target supports section anchors, then we probably want to maximize, + usage of section anchors by adding functions and variables referenced by + these functions to same partition. For now, we only add functions and variables + if they are from same TU. This would (hopefully) ensure we are not + worse than non-LTO case. */ + + if (target_supports_section_anchors_p ()) + { + cgraph_node *cnode = as_a<cgraph_node *> (snode); + std::set<symtab_node *> set; + get_referenced_cnodes_vnodes (cnode, set); + + for (std::set<symtab_node *>::const_iterator it = set.begin (); it != set.end (); ++it) + { + symtab_node *node = *it; + if (!symbol_partitioned_p (node) && flag_toplevel_reorder + && !node->no_reorder + && node->get_partitioning_class () == SYMBOL_PARTITION) + add_symbol_to_partition (partition, node); + + int index = lto_symtab_encoder_lookup (partition->encoder, node); + if (index != LCC_NOT_FOUND && index < last_visited_node - 1) + cost--; + else + cost++; + } + + set.clear (); + } } else { @@ -633,7 +724,26 @@ lto_balanced_map (int n_lto_partitions) if (!symbol_partitioned_p (vnode) && flag_toplevel_reorder && !vnode->no_reorder && vnode->get_partitioning_class () == SYMBOL_PARTITION) - add_symbol_to_partition (partition, vnode); + { + if (target_supports_section_anchors_p ()) + { + /* If vnode is from same TU as refs_node, then add it + to the current partition. + If vnode is not from same TU as refs_node, then we + don't want to immediately add vnode to current partition + and give get_referenced_cnodes_vnodes() chance to find a "better" + partition to add it to. We remember to add vnode to current + partition by using vnode_partition_map, in case + get_referenced_cnodes_vnodes() does not add vnode. */ + + if (vnode->lto_file_data == refs_node->lto_file_data) + add_symbol_to_partition (partition, vnode); + else + vnode_partition_map[vnode] = partition; + } + else + add_symbol_to_partition (partition, vnode); + } index = lto_symtab_encoder_lookup (partition->encoder, vnode); if (index != LCC_NOT_FOUND @@ -695,7 +805,6 @@ lto_balanced_map (int n_lto_partitions) cost++; } } - /* If the partition is large enough, start looking for smallest boundary cost. */ if (partition->insns < partition_size * 3 / 4 || best_cost == INT_MAX @@ -758,6 +867,18 @@ lto_balanced_map (int n_lto_partitions) } } + /* All the variables in vnode_partition_map that were not added to any partition by get_referenced_cnodes_vnodes(), + add them to the partitions they were first referenced in. */ + + for (std::map<varpool_node *, ltrans_partition>::const_iterator it = vnode_partition_map.begin (); it != vnode_partition_map.end (); ++it) + { + varpool_node *vnode = it->first; + ltrans_partition part = it->second; + + if (!symbol_partitioned_p (vnode)) + add_symbol_to_partition (part, vnode); + } + next_nodes.truncate (0); /* Varables that are not reachable from the code go into last partition. */ diff --git a/gcc/toplev.c b/gcc/toplev.c index c480bfc..96ea625 100644 --- a/gcc/toplev.c +++ b/gcc/toplev.c @@ -1165,7 +1165,7 @@ general_init (const char *argv0, bool init_signals) /* Return true if the current target supports -fsection-anchors. */ -static bool +bool target_supports_section_anchors_p (void) { if (targetm.min_anchor_offset == 0 && targetm.max_anchor_offset == 0) diff --git a/gcc/toplev.h b/gcc/toplev.h index 0beb06e..63071bc 100644 --- a/gcc/toplev.h +++ b/gcc/toplev.h @@ -98,4 +98,6 @@ extern const char *set_random_seed (const char *); extern void initialize_rtl (void); +bool target_supports_section_anchors_p(void); + #endif /* ! GCC_TOPLEV_H */