Hello.  This is my first post to this list, so I apologise in advance if I
get any of the 'stupid newbie' things wrong in this post.  I have spent a
while reading the last month's archive.

I'm looking at the need to build a very comprehensive code optimizer and
generator for a very strange (antique but still used mainframe) machine.
Since I'm really and OS type and in a rational world would never be
attempting compiler stuff, I figure I'm better off starting with the work of
others and extending it rather than making foolish attempts t start from
scratch.  The GCC chain looks like a particularly good starting place.

There are several things I'll need to do that it doesn't appear from what
I've read (so far) that GCC currently does.  I'm first wondering if these
are comtemplated future things, or things that have been deliberately
excluded for good reason, or even perhaps things that once existed and were
removed for some reason.  I expect there are people here that will know the
answers!  ;-)

The basic project description:

The machine is very strange in modern terms.  It doesn't have registers, it
executes code in reverse Polish, the integer data type is a degenerate form
of the floating point data type, pointers are magic, there are no page
tables, and the word size isn't a power of 2, just to start.  Oh, and the
integers (floating point mantissas with exponent of zero) are
sign/magnitude.  And boolean quantities are even/odd, rather than
zero/nonzero.

Clearly not a terribly good fit for a normal GCC target, but I have some
hope here.  For one thing, the actual target will be a soft emulation of the
original hard machine, and I have control over the instruction set for the
emulator.  So I can introduce new instructions that do 2's complement
numbers and IEEE math, *but*, I still need to maintain the old stuff too.

The idea is to read existing code files, which retain a good deal of
symbolic information, and build appropriate trees.  Since the code is
reverse Polish this this isn't all that hard in many cases. Since GCC
supports nested procedures I shouldn't have to do terribly clever hacks to
deal with that aspect of the machine.

Obviously I have to write a frontend to do this, and obviously it will be a
heck of a lot of work.  We have code that will take an existing codefile
apart and build trees of our own, which can be used to regenerate the
existing code.  I'd like suggestions on where I might start for a frontend.
Should I start with the toy compiler?  Should I try to strip down the C
frontend?  Hopefully I shouldn't start from scratch?

A problem (I think) is that I have one or two basic new number formats.  I
see that GCC doesn't support ones comp or sign-magnitude integers.
Understandable, and no complaints.  However, my impression at the moment is
that I need to add just exactly that form of support.  And ideally this
woudl be a new data type (rather than a C++ class with methods) that was
still arithmetic in nature, so that the optimizer passes can do all the
common things to simplify the math.  (In particular the compilers that
generate this code tend to pessimize simple constants into a series of small
literals and insert/rotate instructions; it would be good to do constant
folding.)  I would also expect that I should be able to define conversions
from this representation to 2s comp in some cases - for instance, small
positive literals can be done in either arithmetic base, and are far faster
in 2s comp than in emulated sign/magnitude instructions.  I would hope I
could add optimizations that would deal with this.

Does it seem feasible to be able to add a new math data type and still
expect things to work?  Or is this a 'flat impossibility'?  Any pointers on
what I should start looking at in the compiler?

I think I can handle the boolean odd/even problem simply by generating an
implicit "and =1" to every result before testing it, and then hoping the
optimizer can perhaps convert that to a bit test and branch instruction
rather than a literal 'and' in many cases.  Does this seem like a reasonable
approach, or should I consider implementing a second boolean data type with
appropriate conversions?

The one other thing I'm concerned (greatly) about is whole program
optimization ability.  SInce I'm reading an entire program, the "source
file" will contain all code in the program.  The only outside routines will
be in the OS or in dynamic link libraries.  Of which there are zillions, but
fortunately with decent parameter and return info descriptions.

I'm hoping that the optimizer would be able to analyze the entire program
and determine ALL places global (or at least global to the current nested
procedure) variables are referenced and modified, at least directly.  I'd
hope that nested procedures could be analyzed sufficiently that decisions
could be made on inlining them, and possibly discarding the actual procedure
if all invocations get inlined.  (It is common to see procedures where the
instruction count for parameter passing and reception exceeds the actual
procedure code by 2-5 times for small procedures.  I'd like to fix that.)

I see references in the documentation that leads me to belive that GCC can
inline in some cases, but I see other things that make me believe that it
generates and optimizes a procedure at a time, rather than taking the entire
codeflow into account.  But then, I'm still trying to figure out what web
resources might be vaguely up to date, and which are antiques.  I'm not
sufficiently up to speed on the code yet to figure out what is really
happening for myself.

What would GCC likely to be able to do with whole-program optimization and
inlining, when presented with the entire program in one compile unit?  Do I
have reasonable expectations that it might do a pretty good job?  Or is this
an area where I'm likely to have to design optimization passes myself?

Thanks in advance.  I'm sure I'll have more questions later, and I can
certainly expand on things that may not be clear above.

        Loren Wilton

Reply via email to