I dunno, this seems like a thing you could better figure out by
trying
it and seeing where the problems are than by trying to anticipate
every single possible problem
(not that there should be no design, but that it would be better to
start with a design and iterate it than try to figure out perfect
ahead of time).
I agree that it is pointless to go thru the information bit by bit.
On the other hand there may be other things like the typedefs that
serve no purpose in the middle end but do take a lot of space and
require time to traverse.
However, at some point we are going to have to get down to the
discussion of what it means for two types declared in two different
compilation units to be "the same" and the less baggage that we
have dragged in that is not relevant to that decision, the less
unpleasant that process will be.
Sure, typedefs in C/C++ seem clearly useless. I'm just curious how
you plan to go about deciding whether things are useless in a more
general context. How fine of a granularity do you intend to inspect
bits? Trees have lots of random stuff that are hard to identify and
unify. Hopefully this will be a good step towards making LTO actually
be able to work with source files that come from different languages.
Does this mean that all language-specific type info will be removed?
More generally, can you detail what your plan is? Is it to remove
specific pieces (like typedefs, what else?) or just hack and slash
random stuff if it gets in your way? I'm more curious about your
approach and "threshold for usefulness" than an abstract statement
about how you will remove useless stuff and keep the useful stuff :)
-Chris