> From: Fritz Mueller > is it possible for you deduce where Unix _should_ be placing these "bad" > bits (from file offset octal 4220)?
Yes, it's quite simple: just add the virtual address in the code to the physical address of the bottom of the text segment (given in UISA0). The VA is actually 04200, though: the 04220 includes 020 to hold the a.out header at the start of the command file. So, with UISA0 containing 01614, that gives us PA:161400 + 04200 = PA:165600, I think. And it wound up at PA:171600 - off by 04000 (higher) - which is obviously an interesting number. > Maybe a comparison of addresses where the bits should be, with > addresses where the "bad" copy ends up, could point us at some particular > failure modes to check in the KT11, CPU, or RK11... Here's where it gets 'interesting'. Executing a command with pure text on V6 is a very complicated process. The shells fork()s a copy of itself, and does an exec() system call to overlay the entire memory in the new process with a copy of the command (which sounds fairly simple, at a high level) - but the code path to do the exec() with a pure text is incredibly hairy, in detail. In particular, for a variety of reasons, the memory of the process can get swapped in and out several times during that. I apparently used to understand how this all worked, see this message: https://minnie.tuhs.org/pipermail/tuhs/2018-February/014299.html but it's so complicated it's going to take a while to really comprehend it again. (The little grey cells are aging too, sigh...) The interesting point is that when V6 first copies the text in from the file holding the command (using readi(), Lions 6221 for anyone who's masochistic enough to try and actually follow this :-), it reads it in starting from the bottom, one disk block at a time (since in V6, files are not stored contiguously). So, if it starts from the bottom, and copies the wrong thing from low in the file _up_ to VA:010200, when it later gets to VA:010200 in the file contents, that _should_ over-write the stuff that got put there in the wrong place _earlier_. Unless there's _another_ problem which causes that later write to _also_ go somewhere wrong... So, I'm not sure when this trashage is happening, but because of the above, my _guess_ is that it's in one of the two swap operations on the text (out, and then back in). (Although it might be interesting to look at PA:165600 and see what's actually _there_.) Unix does swapping of pure texts in a single, multi-block transfer (although not always as an integral number of blocks, as we found out the hard way with the QSIC :-). So my suspicions have now switched back to the RK11... One way to proceed would be to stop the system after the pure text is first read in (say around Lions 4465), and look to see what the text looks like in main memory at _that_ point. (This will require looking at KT11 registers to see where it's holding the text segment, first.) If that all looks good, we'll have to figure out how to stop the system after the pure text is read back in (which does not happen in exec(), it's done by the normal system operation to swap in the text and data of a process which is ready to run). We could also stop the system after the text is swapped out, and key in a short (~ a dozen words) program to read the text back in from the swap device, and examine it - although we'd have to grub around in the system a bit to figure out where it got written to. (It might be just easier to stop it at, say, Lions 5196 and look at the arguments on the kernel stack.) > a suggestion here to check the KT11 address translation adders ... A > bug in one of the carry lookahead generators used between the bit > slices of that adder could cause a mistranslation on only a fairly > selective subset of virtual addresses This could be happening, but from the reasoning above about the order that the blocks of the text are read in, something would have to interfere with the later read of the higher memory blocks, too, no? So I'd discount the KT11 _for the moment_. > *IF* that's the case and we can chase the IR trace upstream to the > place of an unlucky mistranslation, it will be pretty easy to track > down then in the hw and fix. It'll be interesting to look at the text after it's read in (i.e. before it's swapped out). If it's OK there, that's pretty conclusive that it _can't_ be the KT11 - because from then on, the kernel doesn't _do anything_ to that binary, except swap it out and in with the RK11. And since those are both single I/O operations (with swapping on the RK11, at least, which can do multi-block transfers), _and_ the bottom of the text segment comes in OK (so the RK11 is being set up with correct disk and main memory addresses for both the out and in), I can't think of a fault _elsewhere_ in the system that could cause that 'stuff winds up in the wrong place' error. I know this is complicated, but look at the bright side: we started with three apparently un-connected problems: * R5 trashage * an 'impossible' MM fault * bad text data The first one turned out to be non-existent (my fault in interpreting the kernel stack in the process core dump), the second was also not really there (although a hardware fault in the console gave us bad data, so there really was a hardware issue there), and now we're down to one - albeit a tricky one. So we were dealing with two un-related hardware problems - now we're down to one, and hopefully soon will have it isolated to a single sub-system! (And thanks to whoever gave us the voltage tip, that fixed the first one.) Noel