LLVM/GCC Integration Proposal

Chris Lattner Fri, 18 Nov 2005 17:51:02 -0800

Hi Everyone,

At the request of several members of the GCC community, I'm writingthis email to describe some of my short-term plans with GCC anddescribe an alternative to the recent link-time optimization [1] andcode generator rewrite [2] proposals.

For those who are not familiar with me, I'm one of the maindevelopers working on the LLVM project (http://llvm.org/). Oneimportant way that LLVM is currently used is as a back-end for GCC.In this role, it provides a static optimizer, interprocedural link-time optimizer, JIT support, and several other features. Untilrecently, LLVM has only been loosely integrated with an old versionof GCC (a 3.4 prerelease), which limited its effectiveness.

Recently, at Apple, I have been working on a new version of the llvm-gcc translation layer, built on GCC 4. This implementation links theLLVM optimizers and code generator directly into the GCC process,replacing the tree-ssa optimizers and the RTL code generator with thecorresponding LLVM components when enabled. The end result is acompiler that is command line compatible with GCC: 'gcc -S t.c -ot.s' does exactly what you'd expect, and most standard command lineoptions are supported (those that aren't are very specific to thedesign of the RTL backend, which we just ignore). I plan to havethis work committed to the Apple branch within a month.

Though not yet implemented, we intend to support link-timeoptimization with many design constraints that match the recentproposal [1]. Because LLVM already currently supports link-timeoptimization and has an architecture that makes it straight-forward,this work mainly amounts to changes in the GCC compiler-driver. Ifyou're interested in the link-time IPO architecture, there aredocuments that describe the high level ideas [10,11] with some(potentially out of date) implementation information.

In this email, I want to briefly talk about the strengths that LLVMbrings to the table, some of the implementation details of myintegration work, some of the important ongoing work that we areworking on, and answer some of the hot questions that will inevitablycome up. :)



==== Strengths of LLVM

LLVM is a modern and efficient compiler framework built out oflibraries with well defined APIs. As I mentioned above, LLVMprovides an optimizer and code generator. It also provides severalother components I won't discuss here. If you are interested, pleasesee LLVM's extensive documentation [9] for more information.

The LLVM mid-level and interprocedural optimizer work on a commonrepresentation (the LLVM IR), which is a three-address SSA-basedrepresentation that is somewhat similar to GIMPLE. It is fullyspecified [3], is easy to analyze and manipulate [4], and is verymemory efficient (for example, it takes about 50M of memory on a 32-bit host to hold the IR for all of 176.gcc, which is about 230KLOC). The IR has a text form (suitable for compiler dumps and fine-grained regression testing) and a 100% equivalent compressed binaryform (suitable for interchange between compile-time and link-timeoptimization steps), both of which correspond to the in-memory IR.

The IR supports several features that are useful to variouscommunities, including true tail calls, accurate garbage collection,etc. The IR is continuously evolving to add new features, and weplan several extensions in the future.

The optimizer itself has a full suite of standard scalaroptimizations and also includes a collection of interproceduraloptimizations and interprocedural analyses, some of which are quiteaggressive [5] (though this particular set is not enabled bydefault). The optimizer is fully modular, which allows us to havenice tools for working with the IR and optimizer, including anautomated bug finding tool [6] which makes tracking downmiscompilations and ICEs really easy.

The LLVM code generator is built on modern techniques. For example,it uses pattern-matching DAG-based instruction selection, maintainsSSA form up until register allocation, represents code in a formquite similar to the "compressed RTL" proposal [7], and supportsdynamically loaded targets. We currently have stable code generatorsfor PowerPC and X86, with targets for Alpha, IA-64, and Sparc alsoavailable, but less stable. LLVM can also emit C code, which allowsit to support systems for which we do not have a native codegenerator implemented.

The design of the register allocation components is very close tothat proposed in Andrew MacLeod's recent proposal [2]. We haveseparate instruction selection, global coalescing, and registerallocation stages, have a spiller that does simple local registerallocation, etc. We currently support multiple pluggable registerallocators, to (for example) support quick -O0 compiles, and tosupport targets who want to have completely custom allocators. Thespiller is currently capable of folding spill code into instructions,but does not support rematerialization yet.



==== Implementation Details

My current work involves integrating the LLVM optimizer and codegenerator into GCC as mentioned above. Here is a basic sketch forthe major pieces this entails:


1. The build system is taught about C++ code.
2. A configure option (--enable-llvm) has been added to GCC's configure
    script.  This enables an ENABLE_LLVM #define.  Every LLVM-specific

piece of code is protected by this macro. Compiling without --enable-llvm

    results in no behavior change.
3. Main is compiled as a C++ function, if ENABLE_LLVM is set.

4. A new header file, llvm.h, defines a small number of C functionswhichthe extant GCC code calls into. For example,tree_rest_of_compilationcalls through this interface to compile a function body. TheGCC C codeis all still compiled with a C compiler. These calls are notparticularlyinvasive, and are all protected with ENABLE_LLVM as describedabove.5. The LLVM interface is implemented with 4 C++ files, which arecurrentlyabout 4000 lines of code. This code basically translatesGIMPLE into LLVM.6. The binary links in the standard LLVM libraries, includingoptimizations

     and code generators taken from an LLVM build tree.

This design point makes the work a completely optional opt-incomponent with a low impact on the existing GCC source base. Weclearly do not support all of the targets that GCC does, so thoseunsupported ones will not be able to use LLVM (currently, see belowfor more information).



====  Ongoing Work

While it has many virtues, LLVM is still not complete, and we areinvesting in adding the missing features. In particular, this systemis missing two key features to be truly useful: debug info supportand inline asm support. Apple is currently investing in adding boththese features, as well as adding vector support.

In the future, I anticipate work to extend the LLVM to be capable ofcapturing higher-level information (e.g. that needed for dynamicdispatch optimizations and more alias information), as well assupporting new targets and the usual collection of miscellaneousimprovements to the current system.

The project is in early stages of development, but code performanceand compile times are currently comparable to an unmodified GCCcompiler on PowerPC.



==== Answers to some hot questions

* What about licensing and copyright assignment issues?

The patch I'm working on is GPL licensed and copyright will beassigned to the FSF under the standard Apple copyright assignment.Initially, I intend to link the LLVM libraries in from the existingLLVM distribution, mainly to simplify my work. This code is licensedunder a BSD-like license [8], and LLVM itself will not initially beassigned to the FSF. If people are seriously in favor of LLVM beinga long-term part of GCC, I personally believe that the LLVM communitywould agree to assign the copyright of LLVM itself to the FSF and wecan work through these details.


* LLVM is written in C++, eww yuck!

It is true. However, the C++ness only exists in the LLVM portions ofthe resultant compiler, no other parts have to be converted to C++(well, except for main). Though there is a lot of potential tomisuse C++ in horrible ways (it is a sharp tool after all) LLVM hasbeen quite successful at using it for good, not evil. While I don'texpect everyone to like use of C++, others will hopefully respectthat it allows us to build good APIs, increase modularity, and getmore done in less time.


* LLVM is missing support for representing high-level information!

First, this isn't true. LLVM does capture some important pieces ofhigh-level information, and is continuing to add more. There is afundamental design tension here between building a compiler thatsupports multiple languages (in a link-time context, that means itcan link translation units from multiple languages together andoptimize across the boundary) and building a compiler that supportsonly one.

Specifically, advocates of the recent link-time-optimization proposal[1] may claim that exposing all of the information that DWARFcaptures (e.g. about the type system) at link-time is a good thing:it makes it easier to do specific high-level optimizations, becauseall of the information is trivially available. However, they areignoring the issue of handling multiple languages at the same time:an optimizer with this design has to be aware of (and be able toupdate) all possible source-language information and must be extendedin the future as new languages are added (this is the problem withuniversal compilers, dating back to the UNCOL design). Also, thelinker needs to know how to combine declarations defined in differentlanguages, because an interprocedural optimizer only want to see onedeclaration for each logical variable (for example). This quicklybecomes difficult and messy, which is presumably why the link-timeproposal allows the linker to "give up" linking two translation units.

In contrast, LLVM has historically leaned the other way: it exposesonly the *important* high-level information in ways that are usefulto the optimizers. As we find that we need more information, weextend it to capture this information as needed. This allows theoptimizers to be simple and forces us to think about the key aspectsof the information we need to capture before we add it. LLVM willcontinue to evolve this way, being extended to support newinformation as it is needed.


* LLVM is missing X, Y, Z feature!

We are continuing to improve and extend LLVM. As I mention above,three of the most important ones for Apple (debug info, inline asm,and vector support) are on the short-term todo list. Others will betackled in time (and help is always appreciated!).


* This will not support <some target>!

As describe above, we won't support every target that GCC currentlydoes. Three options are possible:

1. We (the GCC community) could build an LLVM to GIMPLE translator.This would probably take about as much work as the GIMPLE -> LLVMtranslator (about 4000 LOC), which is not a huge project.

2. We could build an LLVM to RTL translator.
3. Target maintainers can build an LLVM port for their target.

In any case, it is certain that GCC will not drop its existing RTLbackend any time in the near future.



==== Conclusion

As I mentioned above, this work is currently underway. I anticipatecommitting this patch to the Apple branch within a month, and amwondering if people are interested in having this functionality onthe mainline GCC branch.

This work clearly overlaps with the two recent proposals [1,2]. Isuggest that improving LLVM to address deficiencies in it would beeasier and more man-hour-efficient than building a whole newinterprocedural optimization component and rewriting large portionsof the GCC backend. Further, this proposal differs from the othertwo in that it is already largely implemented and works. Finally,there is no reason that multiple different approaches cannot bedeveloped in parallel, if people desire such an approach.


Thoughtful feedback appreciated,

-Chris

==== References

[1] http://gcc.gnu.org/ml/gcc/2005-11/msg00735.html
[2] http://gcc.gnu.org/ml/gcc/2005-11/msg00783.html
[3] http://llvm.org/docs/LangRef.html
[4] http://llvm.org/pubs/2004-09-22-LCPCLLVMTutorial.html
[5] http://llvm.org/pubs/2005-05-04-LattnerPHDThesis.html
[6] http://llvm.org/docs/Bugpoint.html
[7] http://gcc.gnu.org/wiki/Speedup%20areas
[8] http://llvm.org/releases/1.6/LICENSE.TXT
[9] http://llvm.org/docs/
[10] http://llvm.org/pubs/2004-01-30-CGO-LLVM.html
[11] http://llvm.org/pubs/2003-05-01-GCCSummit2003.html

LLVM/GCC Integration Proposal

Reply via email to