Chris - great and interesting overview. Do you have a reading list for more details? Thanks!
Lee Courtney On Thu, May 6, 2021 at 7:35 PM Chris Zach via cctalk <cctalk@classiccmp.org> wrote: > > > Sort of. But while a lot of things happen in parallel, out of order, > speculatively, etc., the programming model exposed by the hardware still is > the C sequential model. A whole lot of logic is needed to create that > appearance, and in fact you can see that all the way back in the CDC 6600 > "scoreboard" and "stunt box". Some processors occasionally relax the > software-visible order, which tends to cause bugs, create marketing issues, > or both -- Alpha comes to mind as an example. > > Interesting to see this. > > I've been reading a lot recently about the Jupiter/Dolphin project and > the more I read the more I understand why it just could not be done. At > the time (and to an extent even now) the only way to really improve a > system's performance was to pipeline the processor, and the Pdp10 > instruction set just wasn't easy to do that with. > > They had a great concept: An Instruction fetch/decode system (IBOX), an > execution engine (EBOX), the obligitory vector processor or FPU (HBOX) > and of course the memory system (MBOX). Break the process up into steps > and have the parts all work in parallel to boost performance. > > Unfortunately they started to find way too many cases where an indirect > instruction would be fetched that would be based on the AC, which was > being changed by another instruction in the EBOX. This would blow out > all the prefetched work in the pipe, forcing the IBOX to do a costly > reload. > > Likewise branch prediction couldn't be done well because most branches > and skips depended on the value in the AC which was once again usually > being modified in the EBOX down the pipe. As soon as it was modified the > pipe had to be flushed and reloaded. It looks like they tried to put > that logic into the IBOX to catch these issues, but that resulted in a > flat processor that wasn't going to benefit from any parallelism, an > endless series of bugs, and an IBOX that was pretty much running with > its own EBOX. > > It got worse when they realized that the Extended memory segments in the > 2060 architecture totally wrecked the concept of an instruction > decoder/execution box. There were just too many places where an indirect > instruction to another section which was then based on the AC's would > result in Ibox tossing the queue and invalidating the translation > buffers. Increasing the translation buffer helped (I think that's one of > the things they did on the final 2065 to make it faster) but they > couldn't make that big and fast enough. I guess an indirect jump > instruction based on comparing the AC to an indirect address pointing to > an extended segment would be enough to make any decoder just cry. > > It's sad to read, you can almost see then realizing it was doomed. The > Foonly F1 was a screamer, but it was basically the KA10 instruction set > and couldn't run extended memory segments like the 2060. And when they > tried to do the same thing with the F4 it came out to be a little slower > than a 2060. I used to think they put only one extended segment in the > 2020 to cripple the box, but maybe they started running into the same > problem and ran out of microcode space to try and address it. > > Couple this with the fact that much of the 20 series programs were built > in assembler (and why not, it was an amazing thing to program) and you > just had too many programs with cool bespoke code that would totally > trash a pipeline. Fixing compilers to order instructions properly could > have worked, but people just wrote in assembler it wasn't going to > happen and they weren't about to re-code their app to please the new > scheduler God. > > The VAX instruction set was a lot less beautiful, but could be pipelined > easier especially with the dedicated MMU so they took the people and > pipelined the hell out of the 780 resulting in the nifty 8600/8650 and > later the 8800's. Dec learned their lesson when they built Alpha, and > even Intel realized that their instruction set needed to be pipelined > for the Pentium Pro and above processors. > > Ah well. I don't think it was evil marketing or VAX monsters that killed > the KC10, it was simply the fact that the amazing instruction set > couldn't be pipelined to make it more efficient for hardware and the > memory management system wasn't as efficient as the pdp11/Vax MMU concept. > > -- Lee Courtney +1-650-704-3934 cell