gengtype future directions

Zack Weinberg Mon, 26 Mar 2007 14:12:42 -0800

So I've just checked in a patch series which accomplishes about half
of what I was originally aiming to achieve with gengtype, and I'd like
to review what I think should still be done.  My ultimate goal - not
just with gengtype - is to remove all hardwired kludges and
dependencies on target header files from all of the programs run at
build time, so that one might (in principle) build GCC targeting
wholly different processors without needing to recompile any of the
gen* programs.  I don't expect to personally achieve this in the
foreseeable future, at my current rate of about one substantial patch
(series) a year, but perhaps others will find it a project worthy of
contributing to.


Most of gengtype's hardwired kludges exist because it does not do
preprocessing, and therefore my recommendation is that we work toward
a state where gengtype can use libcpp to do that (we already have a
preprocessor library, let's use it :)  If libcpp is too slow for this
application that should be dealt with by improvements to libcpp.
There are two major obstacles to doing this.  First, gengtype
currently discards most of the input text in the lexical scanner; only
declarations/definitions that are "interesting" make it to the parser.
libcpp cannot do this and should not be made to do it.  Instead,
gengtype's parser should be revised so that it accepts exactly the
tokens of C translation phase six, which is what libcpp will give you,
and discards "uninteresting" content (like function bodies) itself.  A
related problem is that gengtype currently treats array length
declarations, '[' integer-constant-expression ']' in the C grammar, as
string tokens (!) -- this needs to change.  It should not be necessary
to parse expressions, only to scan forward for the balancing close
bracket and store a serialized sequence of tokens as text.

The other major obstacle is from libcpp's side: gengtype will need a
mechanism (presumably a new callback function) for telling libcpp to
ignore #include directives, and a policy to determine which ones
should be included.  The obvious answer of "all of them" does not
work, because we will need at least some macros to be visible
(otherwise there would be no need for kludges around the
unavailability of macros).  Hopefully "most of them" will be a
feasible answer, though.  A related problem is that we currently have
no way of telling gengtype about the set of system-header-defined
types (size_t, ino_t, etc) and it really isn't a good idea for
gengtype to have to parse system headers -- for one thing, how does it
know where they are?  for another, it's amazing what gremlins lurk in
there.

The immediate benefits of having preprocessing in gengtype are that we
could eliminate the kludges for VECs and input.h (it should, however,
be mentioned that the input.h mess could vanish if the
USE_MAPPED_LOCATION conversion were completed) and that we would not
need Flex for a build of GCC (unless treelang were desired).  We could
also easily cause gengtype to process all structure definitions, thus
eliminating the need for a vacuous GTY(()) on all GC-relevant-types
(the only reason I didn't do that in this patch series is because it
would have required several more kludges around the lack of a
preprocesor).

A slightly more remote benefit is more from the better approximation
to the type grammar that the new parser uses than from preprocessing,
but I wouldn't do it until after textual skipping in the lexical
analyzer is eliminated, at least.  That is, we could teach the parser
to recognize GTY((...)) as if it were "just another" type qualifier,
rather than a very special case.  The major win from that would be
that we could recognize a GTY(()) tag on the *definition* of a
program-scope global, rather than its "extern" declaration as is now
necessary; that in turn would simplify the logic for deciding what
mark routines need to be written where.  It might render some of the
gtype-* files unnecessary, even.  (I haven't looked at this in
detail.)

It would be nice to eliminate the #includes of a few .def files in
gengtype.c.  I think that could be done with preprocessing plus real
support for enums in gengtype.  That might also help with the removal
or reduction of the extensive special-case support for the tree and
RTL types.

zw

gengtype future directions

Reply via email to