Hear, hear! I think using ecj as a gcj front end sounds like a terrific idea!
Kind regards, Thomas Hallgren Tom Tromey wrote:
Now that the GPL v3 looks as though it may be EPL-compatible, the time has come to reconsider using the Eclipse java compiler ("ecj") as our primary gcj front end. This has both political and technical ramifications, I discuss them below. Steering committee members, please read through if you would. I think this requires some resolution at the SC/FSF level. First, a brief note on gcjx. I had intended gcjx to serve not only as a cleanly written replacement for the current gcj, but also as a model for how GCC front ends should be written in the future; in particular I think writing it as a library and separating out the tree-generating code from the bulk of the compiler remain good ideas. I enjoyed, and continue to enjoy, the writing of gcjx. However, in this case I think that pleasure must give way to the greater needs of efficiency and cross-community cooperation. Motivation. The motivation for this investigation is simple: sharing code is preferable to working in isolation. In particular this change would let us offload much of the front end maintenance onto a different group. Ecj has a good front end (much better than the current gcj) and decent bytecode generation. It is fully 1.5-compliant and, apparently, is tested against the TCK by the upstream maintainers (us gcj developers don't have TCK access). It also has some improvements for 1.6 (stack maps). Upstream is very active. gcjx by comparison is unfinished and really has just a single full-time developer, me. Technical approach. Historically we've wanted to have a 'native' java-source-code-reading compiler, that is, one which parses java sources and converts them directly to trees. From what I can remember this was based on 3 things: * In the past the compiler handled loops built with LOOP_EXPR better than it handled loops built "by hand" out of GOTO_EXPRs. My understanding is that this has changed since tree-ssa. The issue here was that we made no attempt to rebuild a LOOP_EXPR from java bytecode. * The .java front end could do a "constant array" optimization. This optimization has not worked for quite some time (there's a PR). In any case we could implement this for bytecode if it matters. * The .java front end could more efficiently handle class literals. With the new 1.5 'ldc' bytecode extension, this is no longer a problem. In other words, as far as I can remember, our old reasons for wanting this are obsolete. I think our technical approach should be to have ecj emit class files, which would then be compiled by jc1. In particular I think we could change ecj to emit a single .jar file. This has a few benefits: it would give -save-temps meaning for gcj, it would let us more easily drop ecj into the existing specs mechanism, and it would require very few changes to the upstream compiler. An alternative approach would be to directly link ecj to the gcc back end. However, this looks like significantly more work, requiring much more hacking on the internals of the upstream compiler. I suspect that this won't be worth the effort. In my preferred approach we would simply delete a portion of the existing gcj and turn jc1 into a purely bytecode-based compiler. Then we would proceed to augment it with all the bits needed for proper 1.5 support. ecj is written in java. This will complicate the bootstrap process. However, the situation will not be quite as severe as the Ada situation, in that it ought to be possible to bootstrap gcj using any java runtime, including mini ones such as JamVM -- at least, assuming that the suggested implementation route is taken. Politics. I don't know whether the FSF or the GCC SC would let us import ecj, even assuming it is actually GPL compatible. SC members, please discuss. We don't know how upstream would react. I think this is a fairly minor risk. It is unclear to me whether we must even rely on GPL v3 if we went with the separate-ecj route. Any comments here? In the exec-via-specs approach we're invoking ecj as a separate executable, much the same way we exec 'as' or 'ld'. Comments on this from license-oriented folks would be appreciated. Summary. I think this would be the most efficient way to achieve 1.5 language compatibility for gcj, and it would also make future language changes less expensive. Given the scope of the entire gcj project, especially when the scarcity of resource devoted to it are taken into account, this is significant enough to warrant the change. Tom