Hello,
I would like to submit the following project for Google Summer of Code:
Propagating array data dependence information from Tree-SSA to RTL
Synopsis:
The RTL array data dependence analyzer was written specifically for swing
modulo scheduling (SMS) implementation in GCC. It is overly conservative,
because it uses RTL alias analysis to find intra- and inter-loop memory
dependencies. It also assumes that the distance of an inter-loop memory
dependence equals to one.
I propose to improve the quality of data dependence analysis on RTL via
propagating the information from Tree-SSA dependence analyzer. The saved
information will be used in construction of data dependence graph for
SMS. It can also be used for other optimizations, e.g. scheduler.
Rationale:
In GCC, there are two analyses of array data dependencies, which are run
on Tree-SSA and RTL levels, respectively. The Tree-SSA data dependence
analysis is located in tree-data-ref.[ch] files, also using parts of
tree-chrec.c. For a given loop, the analysis builds a vector of data
references (represented as struct data_reference) and a vector of
dependence relations (represented as struct data_dependence_relation). A
data reference contains links to a memory reference and a container
statement, a first accessed location, a base object, and other memory
attributes. A dependence relation contains the data references it links,
its type, distance vector, direction vector, and subscripts information.
The RTL array data dependence analyzer is located in ddg.[ch] files and
was written specifically for swing modulo scheduling (SMS) implementation
in GCC. The analyzer builds a data dependence graph (DDG) for a given
basic block. The DDG is represented as a vector of nodes. Each DDG node
contains vectors of incoming and outgoing dependence edges, sets of
successors and predecessors of the node in the DDG, and the containing
instruction. Each DDG edge, analogously to the Tree-SSA analysis, contains
source and destination nodes of the edge, a dependence type, an edge
latency, and a distance. Additionally, the edges that are going to/from
the same node form a linked list analogously to control flow edges. The
analyzer uses scheduler dependence analysis (located in sched-deps.c) to
build intra-loop dependencies and the data flow engine (located in df-*.c)
to build inter-loop dependencies.
The RTL analyzer has the following deficiencies:
* DDG is built only for a single basic block loops. This is because the
current SMS implementation only supports such loops. The problem not only
puts additional constraint on the SMS, but also prevents using the
dependence information in other passes.
* Distance of inter-loop dependencies is not calculated and is set to one
conservatively. This limits the SMS implementation in interleaving
instructions from successive iterations.
* Intra-loop dependencies are calculated using RTL alias analysis, which
is weaker (e.g. it is not able to disambiguate array references on
architectures that lack base+offset addressing mode).
I propose to improve the quality of data dependence analysis on RTL via
propagating the information from Tree-SSA dependence analyzer. The
project will consist of the following steps:
* Export the Tree-SSA data dependence graph as a global data structure
(or a field in struct function). The information will be collected before
ivopts to prevent it from turning array references into pointer
references, which badly influences data dependence analysis.
* Create the mapping between RTX mems and original trees. A part of
existing patch[1] of propagating alias information to RTL could be used.
The patch saves links to original trees in mem’s attributes analogously to
MEM_EXPRs.
* Implement the verifier of the consistency of the saved information to
check that it stays intact throughout the RTL pipeline.
* Use the saved information when constructing the data dependence graph
in ddg.c. When two memory references are found to be dependent, a check
whether the MEMs contain the original trees and whether the trees are
array references will be performed. If the trees are indeed ARRAY_REFs
and the information about their dependence relation can be found in the
exported graph, then it is possible either to avoid creating the spurious
DDG edge, in case the data references are independent, or to assign the
correct distance value to the edge, in case this information is present in
the exported graph.
* Provide new testcases and test the patch for correctness and speedups.
I would be pleased to see Ayal Zaks as my mentor, because proposed
improvement is primarily targeted as modulo scheduling improvement. In
case this is not possible, I will seek guidance from Maxim Kuvyrkov.
Please feel free to share your thoughts and suggestions.
[1] Alias export patch by Dmitry Melnik:
http://gcc.gnu.org/ml/gcc-patches/2005-11/msg01518.html
--
Alexander Monakov