[cctalk] Re: Delay slots, was: Re: Re: early microprocessor limited pipelining [was: Intel 8086 - 46 yrs. ago]

Paul Koning via cctalk Sat, 15 Jun 2024 12:39:19 -0700

> On Jun 15, 2024, at 1:41 PM, Chuck Guzis via cctalk <cctalk@classiccmp.org> 
> wrote:
> 
> I'm certain that Paul has done his share of this, but an art on the CDC
> 6600 was hand-scheduling instruction execution.  There was at least one
> class for this--and probably more.  The CPU could issue one instruction
> every cycle, assuming that there were no conflicts.  The 6600 had
> several functional units whose operation could overlap.

I learned it from OS code reading and adopted some of it for my own work, but 
not much because I actually only worked on the 6500 -- which doesn't have 
multiple functional units.

Writing good code for those machines was further complicated by the fact that 
instructions were either 1/4 or 1/2 word long, could not split across word 
boundaries, and branches would only go to the start of the word.  So there 
tended to be NOPs to pad out the word, which the assembler would supply.  
Avoiding them would make the code go faster and of course make it smaller.

The other complication was a fairly limited set of registers, and the fact that 
loads would go only to X1..X5 while stores could only come from X6 or X7.  So a 
memcpy would involve a register to register transfer.  That takes 3 cycles on a 
6600, so a skillful memcpy implementation would use two load registers, both 
store registers, and two separate functional units for the R-R move (one via 
the "boolean" unit and one via the "shift" unit).  I remember my bafflement the 
first time I saw a shift (by zero) used to do just a register to register move; 
on a 6500 you wouldn't have any reason to write that.

I once crashed the PLATO system in mid-day, when the load hit peak (600 users 
logged on) because I had slowed down a critical terminal output processing step 
and the machinery didn't have flow control there.  My bosses were NOT happy.  I 
solved the issue by cleaning up that block of code to avoid all NOPs; the 
result was that it was both shorter and faster than the previous version while 
still delivering the new feature.  :-)

        paul

[cctalk] Re: Delay slots, was: Re: Re: early microprocessor limited pipelining [was: Intel 8086 - 46 yrs. ago]

Reply via email to