> On Jun 10, 2024, at 12:18 PM, Joshua Rice via cctalk <cctalk@classiccmp.org>
> wrote:
>
> On 10/06/2024 05:54, dwight via cctalk wrote:
>> No one is mentioning multiple processors on a single die and cache that is
>> bigger than most systems of that times complete RAM.
>> Clock speed was dealt with clever register reassignment, pipelining and
>> prediction.
>> Dwight
>
> Pipelining has always been a double edged sword. Splitting the instruction
> cycle into smaller, faster chunks that can run simultaneously is a great
> idea, but if the actual instruction execution speed gets longer, failed
> branch predictions and subsequent pipeline flushes can truly bog down the
> real-life IPS. This is ultimately what led the NetBurst architecture to be
> the dead-end it became.
RISC can do pipelining much more easily (as Cray first demonstrated around
1964, with the CDC 7600). For one thing, "bypass" is doable, and widely used,
in machines that use both pipelining and multiple functional units. I remember
the SiByte 1250 and/or the Raza XLR (both MIPS64, early 2000s) but I assume it
was done well before then.
> DEC came across another issue with the PDP-11 vs the VAX. Although the
> pipelined architecture of the VAX was much faster than the PDP-11, the actual
> time for a single instruction cycle was much increased, which led to
> customers requiring real-time operation to stick with the PDP-11, as it was
> much quicker in those operations. This, along with it's large software
> back-catalog and established platform led to the PDP-11 outliving it's
> successor. Josh Rice
That reminds me of the Motorola 68040. I did the fastpath for an FDDI switch
(doing packet switching in software) on one of those. I discovered that the
VAX-like addressing modes that look so nice on the 68040 takes a bunch of
cycles, but there was a "RISC subset" using just the simplest addressing modes
that would produce single cycle execution. So I limited my code to just those.
The other weirdness was branch prediction. The 68040 had no branch prediction
cache, instead it would statically predict all branches to be taken. Note the
difference from the usual practice, which is to predict backward branches as
taken and forward ones as not taken. No problem either way, but it just meant
that the assembly code looked a bit odd because an if/then/else block would
have the unlikely case immediately after the branch (fall through, not the
predicted case) and after that the likely case (branch taken, as predicted).
It was fun to do 60k packets per second on a 25 MHz processor...
paul