> On Jun 13, 2024, at 6:22 PM, Jonathan Stone via cctalk 
> <cctalk@classiccmp.org> wrote:
> 
> 
> On Thursday, June 13, 2024 at 03:00:22 PM PDT, Maciej W. Rozycki via cctalk 
> <cctalk@classiccmp.org> wrote:
> 
>> The architecture designers cheated however even in the original ISA in
>> that moves from the MD accumulator did interlock.  I guess they figured
>> people (either doing it by hand or by writing a compiler) wouldn't get
>> that right anyway. ;)
> 
> I always assumed that was because the latency of multiply, let alone divide, 
> was far too many cycles for anyone to plausibly schedule "useful" 
> instructions into. Wasn't r4000 divide latency over 60 cycles? Wasn't r4000 
> divide latency more than 60 cycles?

Probably, because divide is inherently an interative operation, and usually is 
implemented to produce one bit of result per cycle.  A notable exception is the 
CDC 6600, which throws a whole lot of logic at the problem to produce two bits 
of result per cycle.  The usual divide amounts to a trial subtraction and 
shift; the 6600 implementation does THREE trial subtractions concurrently.  Not 
cheap when you're using discrete transistor logic.

Multiply is an entirely different matter, that can be done in few cycles if you 
throw enough logic at the problem.  Signal processors are an extreme example of 
this because multiply/add sequences are the essence of what they need to do.  
This is also why Alpha omitted divide entirely and made programs do multiply by 
the reciprocal instead.

The best argument for doing interlocking in the hardware isn't that it's hard 
for software to get right.  Code generators can do it and that's a one time 
effort.  But the required are often dependent on variables that are not known 
at compile time, for example load/store delays, or branches taken/not taken.  
Run time interlocks deal with the actual conflicts as they occur, while 
compiler or programmer conflict avoidance has to use the worst case scenarios.

        paul

Reply via email to