> On Jun 15, 2024, at 1:41 PM, Chuck Guzis via cctalk <cctalk@classiccmp.org>
> wrote:
>
> I'm certain that Paul has done his share of this, but an art on the CDC
> 6600 was hand-scheduling instruction execution. There was at least one
> class for this--and probably more. The CPU could issue one instruction
> every cycle, assuming that there were no conflicts. The 6600 had
> several functional units whose operation could overlap.
I learned it from OS code reading and adopted some of it for my own work, but
not much because I actually only worked on the 6500 -- which doesn't have
multiple functional units.
Writing good code for those machines was further complicated by the fact that
instructions were either 1/4 or 1/2 word long, could not split across word
boundaries, and branches would only go to the start of the word. So there
tended to be NOPs to pad out the word, which the assembler would supply.
Avoiding them would make the code go faster and of course make it smaller.
The other complication was a fairly limited set of registers, and the fact that
loads would go only to X1..X5 while stores could only come from X6 or X7. So a
memcpy would involve a register to register transfer. That takes 3 cycles on a
6600, so a skillful memcpy implementation would use two load registers, both
store registers, and two separate functional units for the R-R move (one via
the "boolean" unit and one via the "shift" unit). I remember my bafflement the
first time I saw a shift (by zero) used to do just a register to register move;
on a 6500 you wouldn't have any reason to write that.
I once crashed the PLATO system in mid-day, when the load hit peak (600 users
logged on) because I had slowed down a critical terminal output processing step
and the machinery didn't have flow control there. My bosses were NOT happy. I
solved the issue by cleaning up that block of code to avoid all NOPs; the
result was that it was both shorter and faster than the previous version while
still delivering the new feature. :-)
paul