On Jun 4, 2008, at 8:27 AM, Kenneth Zadeck wrote:

It is certainly not going to be possible to do this for all ipa passes, in particular any pass that requires the function body to be reanalyzed as part of the analysis pass will not be done, or will be degraded so that it does not use this mechanism. But for a large number of passes this will work.

How this scales to google sized applications will have to be seen. The point is that there is a rich space with a complex set tradeoffs to be explored with lto. The decision to farm off the function bodies to other processors because we "cannot" have all of the function bodies in memory will have a dramatic effect on what gcc/ lto/whopr compilation will be able to achieve.

I agree with a lot of the sentiment that you express here Kenny. In LLVM, we've intentionally taken a very incremental approach:

1) start with all code in memory and see how far you can get. It seems that on reasonable developer machines (e.g. 2GB memory) that we can handle C programs on the order of a million lines of code, or C++ code on the order of 400K lines of code without a problem with LLVM.

2) start leaving function bodies on disk, use lazily accesses, and a cache manager to keep things in memory when needed. I think this will let us scale to tens or hundreds of million line code bases them. I see no reason to take a whopr approach just to be able to handle large programs.

Independent of program size is the efficiency of LTO. To me, allowing lto to scale and work well on 2 to 16 way shared memory machine is the first interesting order of business, just because that is what many developer's have on their desk. Once that issue is nailed, going across a cluster is an interesting next step.

In the world I deal with, most code is built out of a large number of moderate sized libraries/plugins, not as a gigantic monolithic a.out file. I admit that this shifts the emphasis we've been placing on to making things integration transparent, support for LTO across code bases with pieces missing, etc and not on support for ridiculously huge code bases.

I guess one difference between the LLVM and GCC approaches stems from the "constant factor" order of magnitude of efficiency difference between llvm and gcc. If you can't reasonable hold a few hundred thousand lines of code in memory then you need more advanced techniques in order to be generally usable for moderate-sized code bases.

-Chris

Reply via email to