Hello. This is my first post to this list, so I apologise in advance if I get any of the 'stupid newbie' things wrong in this post. I have spent a while reading the last month's archive.
I'm looking at the need to build a very comprehensive code optimizer and generator for a very strange (antique but still used mainframe) machine. Since I'm really and OS type and in a rational world would never be attempting compiler stuff, I figure I'm better off starting with the work of others and extending it rather than making foolish attempts t start from scratch. The GCC chain looks like a particularly good starting place. There are several things I'll need to do that it doesn't appear from what I've read (so far) that GCC currently does. I'm first wondering if these are comtemplated future things, or things that have been deliberately excluded for good reason, or even perhaps things that once existed and were removed for some reason. I expect there are people here that will know the answers! ;-) The basic project description: The machine is very strange in modern terms. It doesn't have registers, it executes code in reverse Polish, the integer data type is a degenerate form of the floating point data type, pointers are magic, there are no page tables, and the word size isn't a power of 2, just to start. Oh, and the integers (floating point mantissas with exponent of zero) are sign/magnitude. And boolean quantities are even/odd, rather than zero/nonzero. Clearly not a terribly good fit for a normal GCC target, but I have some hope here. For one thing, the actual target will be a soft emulation of the original hard machine, and I have control over the instruction set for the emulator. So I can introduce new instructions that do 2's complement numbers and IEEE math, *but*, I still need to maintain the old stuff too. The idea is to read existing code files, which retain a good deal of symbolic information, and build appropriate trees. Since the code is reverse Polish this this isn't all that hard in many cases. Since GCC supports nested procedures I shouldn't have to do terribly clever hacks to deal with that aspect of the machine. Obviously I have to write a frontend to do this, and obviously it will be a heck of a lot of work. We have code that will take an existing codefile apart and build trees of our own, which can be used to regenerate the existing code. I'd like suggestions on where I might start for a frontend. Should I start with the toy compiler? Should I try to strip down the C frontend? Hopefully I shouldn't start from scratch? A problem (I think) is that I have one or two basic new number formats. I see that GCC doesn't support ones comp or sign-magnitude integers. Understandable, and no complaints. However, my impression at the moment is that I need to add just exactly that form of support. And ideally this woudl be a new data type (rather than a C++ class with methods) that was still arithmetic in nature, so that the optimizer passes can do all the common things to simplify the math. (In particular the compilers that generate this code tend to pessimize simple constants into a series of small literals and insert/rotate instructions; it would be good to do constant folding.) I would also expect that I should be able to define conversions from this representation to 2s comp in some cases - for instance, small positive literals can be done in either arithmetic base, and are far faster in 2s comp than in emulated sign/magnitude instructions. I would hope I could add optimizations that would deal with this. Does it seem feasible to be able to add a new math data type and still expect things to work? Or is this a 'flat impossibility'? Any pointers on what I should start looking at in the compiler? I think I can handle the boolean odd/even problem simply by generating an implicit "and =1" to every result before testing it, and then hoping the optimizer can perhaps convert that to a bit test and branch instruction rather than a literal 'and' in many cases. Does this seem like a reasonable approach, or should I consider implementing a second boolean data type with appropriate conversions? The one other thing I'm concerned (greatly) about is whole program optimization ability. SInce I'm reading an entire program, the "source file" will contain all code in the program. The only outside routines will be in the OS or in dynamic link libraries. Of which there are zillions, but fortunately with decent parameter and return info descriptions. I'm hoping that the optimizer would be able to analyze the entire program and determine ALL places global (or at least global to the current nested procedure) variables are referenced and modified, at least directly. I'd hope that nested procedures could be analyzed sufficiently that decisions could be made on inlining them, and possibly discarding the actual procedure if all invocations get inlined. (It is common to see procedures where the instruction count for parameter passing and reception exceeds the actual procedure code by 2-5 times for small procedures. I'd like to fix that.) I see references in the documentation that leads me to belive that GCC can inline in some cases, but I see other things that make me believe that it generates and optimizes a procedure at a time, rather than taking the entire codeflow into account. But then, I'm still trying to figure out what web resources might be vaguely up to date, and which are antiques. I'm not sufficiently up to speed on the code yet to figure out what is really happening for myself. What would GCC likely to be able to do with whole-program optimization and inlining, when presented with the entire program in one compile unit? Do I have reasonable expectations that it might do a pretty good job? Or is this an area where I'm likely to have to design optimization passes myself? Thanks in advance. I'm sure I'll have more questions later, and I can certainly expand on things that may not be clear above. Loren Wilton