Hello,

I would like to submit the following project for Google Summer of Code:

Propagating array data dependence information from Tree-SSA to RTL

Synopsis:

The RTL array data dependence analyzer was written specifically for swing modulo scheduling (SMS) implementation in GCC. It is overly conservative, because it uses RTL alias analysis to find intra- and inter-loop memory dependencies. It also assumes that the distance of an inter-loop memory dependence equals to one.

I propose to improve the quality of data dependence analysis on RTL via propagating the information from Tree-SSA dependence analyzer. The saved information will be used in construction of data dependence graph for SMS. It can also be used for other optimizations, e.g. scheduler.

Rationale:

In GCC, there are two analyses of array data dependencies, which are run on Tree-SSA and RTL levels, respectively. The Tree-SSA data dependence analysis is located in tree-data-ref.[ch] files, also using parts of tree-chrec.c. For a given loop, the analysis builds a vector of data references (represented as struct data_reference) and a vector of dependence relations (represented as struct data_dependence_relation). A data reference contains links to a memory reference and a container statement, a first accessed location, a base object, and other memory attributes. A dependence relation contains the data references it links, its type, distance vector, direction vector, and subscripts information.

The RTL array data dependence analyzer is located in ddg.[ch] files and was written specifically for swing modulo scheduling (SMS) implementation in GCC. The analyzer builds a data dependence graph (DDG) for a given basic block. The DDG is represented as a vector of nodes. Each DDG node contains vectors of incoming and outgoing dependence edges, sets of successors and predecessors of the node in the DDG, and the containing instruction. Each DDG edge, analogously to the Tree-SSA analysis, contains source and destination nodes of the edge, a dependence type, an edge latency, and a distance. Additionally, the edges that are going to/from the same node form a linked list analogously to control flow edges. The analyzer uses scheduler dependence analysis (located in sched-deps.c) to build intra-loop dependencies and the data flow engine (located in df-*.c) to build inter-loop dependencies.

The RTL analyzer has the following deficiencies:

* DDG is built only for a single basic block loops. This is because the current SMS implementation only supports such loops. The problem not only puts additional constraint on the SMS, but also prevents using the dependence information in other passes.

* Distance of inter-loop dependencies is not calculated and is set to one conservatively. This limits the SMS implementation in interleaving instructions from successive iterations.

* Intra-loop dependencies are calculated using RTL alias analysis, which is weaker (e.g. it is not able to disambiguate array references on architectures that lack base+offset addressing mode).

I propose to improve the quality of data dependence analysis on RTL via propagating the information from Tree-SSA dependence analyzer. The project will consist of the following steps:

* Export the Tree-SSA data dependence graph as a global data structure (or a field in struct function). The information will be collected before ivopts to prevent it from turning array references into pointer references, which badly influences data dependence analysis.

* Create the mapping between RTX mems and original trees. A part of existing patch[1] of propagating alias information to RTL could be used. The patch saves links to original trees in mem’s attributes analogously to MEM_EXPRs.

* Implement the verifier of the consistency of the saved information to check that it stays intact throughout the RTL pipeline.

* Use the saved information when constructing the data dependence graph in ddg.c. When two memory references are found to be dependent, a check whether the MEMs contain the original trees and whether the trees are array references will be performed. If the trees are indeed ARRAY_REFs and the information about their dependence relation can be found in the exported graph, then it is possible either to avoid creating the spurious DDG edge, in case the data references are independent, or to assign the correct distance value to the edge, in case this information is present in the exported graph.

 * Provide new testcases and test the patch for correctness and speedups.

I would be pleased to see Ayal Zaks as my mentor, because proposed improvement is primarily targeted as modulo scheduling improvement. In case this is not possible, I will seek guidance from Maxim Kuvyrkov.

Please feel free to share your thoughts and suggestions.

[1] Alias export patch by Dmitry Melnik: http://gcc.gnu.org/ml/gcc-patches/2005-11/msg01518.html

--
Alexander Monakov

Reply via email to