We have created a new branch for the incremental parsing work
that Lawrence and I described at the last GCC Summit
(http://gcc.gnu.org/wiki/summit2010?action=AttachFile&do=get&target=IncrementalCompiler.pdf).

To get the branch:

$ svn co svn+ssh://gcc.gnu.org/svn/gcc/branches/pph

The purpose of the branch is to explore ways of speeding up C++
compilation.  The approach is to convert the compiler into a
server with a memory cache that will allow it to short circuit
pre-processing and parsing for files in a translation unit that
have been seen more than once.

The server functionality still is not in the branch.  I have
taken the code from Tom Tromey's incremental branch and will be
committing it in the next few days.

The code currently implements a token cache on disk.  This is
currently enabled with -fpth (for Pre-Tokenized Headers).  Each
file in a translation unit gets its own .pth image. When a file
is found unchanged wrt the .pth image, its tokens are
instantiated out of the image instead of the text stream.

This saves on average ~15% of compilation time on C++.  PTH
images are factored, so a change in one file does require
building the complete PTH image for the whole TU.  Additionally,
each PTH file is segmented into token hunks, each of which can be
validated and applied separately.  This allows reusing the same
PTH file in different translation units.

The implementation is very primitive, so it breaks easily.

On the parser side, we have only added some instrumentation to
cp/parser.c to determine how effective can a parser caching
scheme be given the parsing dependencies in C++ applications
(that's the bulk of the results we presented at the Summit).  We
call this Pre-Parsed Headers (PPH).

Most of the changes that we currently have in the parser are
slated to disappear.  We have included them in the initial branch
in case anyone is interested in reproducing the results on their
own code.  There is a companion python script that builds the
transitive closure of these dependencies to produce coverage
results.  If anyone is interested, I can produce a cleaned up copy
of that script (it contains many internal references to our
codebase, so I need to purify it).

I will post 3 patches with each of the major areas we changed:
libcpp, common gcc files and the C++ parser.  Although the code
is still rough, lacks some comments and documentation, we would
appreciate feedback on the patches.

As we discussed at the summit, the plan is to experiment with an
implementation of pre-parsed headers to see if the benefits we
expect are realizable.  We only have plans to support the C++
front end, since that is the place where we are currently
experience the biggest slow downs.  C parsing is barely on the
radar for us.

However, if anyone is interested in porting Tom's C parsing
changes to the branch, we will welcome it.

Tom agreed to let us use the incremental compiler wiki page to
host this work.  We will be updating it soon.

Reply via email to