So I've just checked in a patch series which accomplishes about half of what I was originally aiming to achieve with gengtype, and I'd like to review what I think should still be done. My ultimate goal - not just with gengtype - is to remove all hardwired kludges and dependencies on target header files from all of the programs run at build time, so that one might (in principle) build GCC targeting wholly different processors without needing to recompile any of the gen* programs. I don't expect to personally achieve this in the foreseeable future, at my current rate of about one substantial patch (series) a year, but perhaps others will find it a project worthy of contributing to.
Most of gengtype's hardwired kludges exist because it does not do preprocessing, and therefore my recommendation is that we work toward a state where gengtype can use libcpp to do that (we already have a preprocessor library, let's use it :) If libcpp is too slow for this application that should be dealt with by improvements to libcpp. There are two major obstacles to doing this. First, gengtype currently discards most of the input text in the lexical scanner; only declarations/definitions that are "interesting" make it to the parser. libcpp cannot do this and should not be made to do it. Instead, gengtype's parser should be revised so that it accepts exactly the tokens of C translation phase six, which is what libcpp will give you, and discards "uninteresting" content (like function bodies) itself. A related problem is that gengtype currently treats array length declarations, '[' integer-constant-expression ']' in the C grammar, as string tokens (!) -- this needs to change. It should not be necessary to parse expressions, only to scan forward for the balancing close bracket and store a serialized sequence of tokens as text. The other major obstacle is from libcpp's side: gengtype will need a mechanism (presumably a new callback function) for telling libcpp to ignore #include directives, and a policy to determine which ones should be included. The obvious answer of "all of them" does not work, because we will need at least some macros to be visible (otherwise there would be no need for kludges around the unavailability of macros). Hopefully "most of them" will be a feasible answer, though. A related problem is that we currently have no way of telling gengtype about the set of system-header-defined types (size_t, ino_t, etc) and it really isn't a good idea for gengtype to have to parse system headers -- for one thing, how does it know where they are? for another, it's amazing what gremlins lurk in there. The immediate benefits of having preprocessing in gengtype are that we could eliminate the kludges for VECs and input.h (it should, however, be mentioned that the input.h mess could vanish if the USE_MAPPED_LOCATION conversion were completed) and that we would not need Flex for a build of GCC (unless treelang were desired). We could also easily cause gengtype to process all structure definitions, thus eliminating the need for a vacuous GTY(()) on all GC-relevant-types (the only reason I didn't do that in this patch series is because it would have required several more kludges around the lack of a preprocesor). A slightly more remote benefit is more from the better approximation to the type grammar that the new parser uses than from preprocessing, but I wouldn't do it until after textual skipping in the lexical analyzer is eliminated, at least. That is, we could teach the parser to recognize GTY((...)) as if it were "just another" type qualifier, rather than a very special case. The major win from that would be that we could recognize a GTY(()) tag on the *definition* of a program-scope global, rather than its "extern" declaration as is now necessary; that in turn would simplify the logic for deciding what mark routines need to be written where. It might render some of the gtype-* files unnecessary, even. (I haven't looked at this in detail.) It would be nice to eliminate the #includes of a few .def files in gengtype.c. I think that could be done with preprocessing plus real support for enums in gengtype. That might also help with the removal or reduction of the extensive special-case support for the tree and RTL types. zw